1 CMPE 180-38 Database Systems Workshop June 13 Class MeetingDepartment of Computer Engineering San Jose State University Summer 2017 Instructor: Ron Mak
2 Extract, Transform, and Load (ETL)
3 Extract, Transform, and Load (ETL), cont’d
4 Extract, Transform, and Load (ETL), cont’d
5 Extract, Transform, and Load (ETL), cont’d
6 XML The Extensible Markup Language (XML) is an industry standard to:Store information. Describe the structure of that information. Exchange the information among different applications in a programming language-independent way. Not all data comes from relational databases! XML data is “semi-structured”. Self-describing: No rigid schema.
7 XML Components An XML element has an opening and a closing tag:The closing tag is mandatory. An XML element may be nested in another element (child elements): XML documents that obey all the syntax rules are “well formed”.
8 XML Components, cont’d An XML element may have content:An element can have both content and child elements. An XML element may have attributes. Attribute values must be quoted: Attribute names must be unique within an element.
9 XML Components, cont’d An empty element has no content and no child elements. An empty element can be “self closed”. Comments:
10 XML Components, cont’d Begin every XML document with the processing instruction: Every XML document must have a single root element:
11 XML Namespaces Prevent element name clashes.An element name can be in the scope of a namespace. A namespace name must be unique. Use a URI (uniform resource identifier) as the name. Start with your unique domain name. A URL is a common form of URI. The URL doesn’t have to point to an actual file.
12 XML Namespaces, cont’d Declare a namespace in an element tag.The scope of the namespace is that element and its children. Example: A namespace declared in the root element has the entire XML document in its scope.
13 XML Namespaces, cont’d Declare the default namespace.Example: All elements in its scope are in the default namespace. Elements not in any namespace scope are “in no namespace”.
14 XML Namespaces, cont’d Declare a non-default namespace with a prefix.Example: Prefix element names that are in the namespace scope (e.g., au). The element containing the declaration is itself in the scope. The prefix is considered part of the element name.
15 XML Namespaces, cont’d Nested namespaces:Why both the book and author namespaces? Prevent the book title and the author title name clash.
16 XML Namespaces, cont’d Alternate:
17 Common XML Tools XPath XSLT“Path expressions” to locate a specific node (element, attribute, or content) or node sets within an XML document. Functions to compare, count, do arithmetic, extract substrings, etc. XSLT Extensible Style Language for Transformation Transform XML from one form to another (such as to HTML).
18 Common XML Tools, cont’dDTD Document Type Definition. Specify the schema of XML documents. The DTD is itself not an XML document. Validate an XML document against its schema.
19 Common XML Tools, cont’dXML Schema XML Schema Definition (XSD). Another way to specify the schema of XML documents. An XML Schema is itself an XML document. A valid XML document is an instance of its schema. XML schema : XML document Java class : Java object XQuery A query language for data stored as XML.
20 Commercial XML EditorsoXygen Windows and Mac 30 days free trial, $99 academic price XMLSpy Windows only 30 days free trial, $499 professional edition
21 Eclipse XML Plugins Web Tools Platform
22 NetBeans XML Plugins Schema editor https://blogs.oracle.com/geertjan/entry/xml_schema_editor_in_netbeans Query window
23 XPath XPath views an XML document as a node tree.Everything in the document is a node. element, attribute, text content Every node is related to another node. parent, child, ancestor, descendant, sibling
24 Location Paths An XPath expression is a location path that walks the tree starting from the root in order to select a single node or a set of nodes. The selection is based on the node relations and conditional tests on attribute values. XPath expressions look like Unix file paths. / represents the root node of the document. Add more element names separated by / to step down the tree.
25 Location Path Examplescatalog.xml
26 Location Path Examples, cont’d
27 Location Path Examples, cont’d
28 Location Path Examples, cont’d
29 Location Path Examples, cont’d
30 Location Path Examples, cont’d
31 Location Path Examples, cont’d
32 XPath Axes child:: descendant:: attribute:: self::Shorthand: just the element name of the child descendant:: Shorthand: // attribute:: self:: descendant-or-self:: following-sibling:: preceding-sibling:: following:: parent:: ancestor:: preceding:: ancestor-or-self:: namespace::
33 XPath Expressions XPath expressions can also include: arithmeticcomparisons let (local variables) if, for, some, every
34 XPath Functions XPath functions include: count()format-number(), round-number() substring-before(), substring-after() contains() string-length() translate()
35 Break
36 XML Schema XML Schema Definition (XSD).Specify the structure of XML data. Application-specific. Replacement for Document Type Definition (DTD).
37 Well-Formed vs. Valid XMLAn XML document is well-formed if it has correct XML syntax. An XML document is valid if its contents conform to its schema. The schema specifies what elements and attributes a valid document must have, and in what order. The schema specifies the data type and format of the content.
38 XML Schema Types Simple type Complex typesAn XML element with only text content. No attributes and no child elements. Complex types Four complex types: Complex type Attributes Children Text content Text only yes - Element only Empty element Mixed content
39 Simple Type DefinitionNo attributes and no children. Example: Element: Some basic types: xs:string xs:integer xs:decimal xs:boolean false or true (or: 0 or 1) xs:date format YYYY-MM-DD
40 Simple Type Definition, cont’dPredefined value Example: If the element is present and has content, the content must match the predefined value. If the element is present but empty, it gets the predefined value. If the element is not present, it has no value.
41 Simple Type Definition, cont’dDefault value Example: If the element is empty or not present, it gets the default value. If the element is present and has content, the content that’s there is used as the value.
42 Derived Simple Types Anonymous custom type.Derive a custom simple type from a base type. The type itself has no name. Example: It applies only to the element (birthday) for which it is defined.
43 Derived Simple Types, cont’dNamed custom type. The type has a name. Multiple elements can refer to the type by name. Example: Element:
44 Derived Simple Types, cont’dAn enumeration set of acceptable content values. Example: Element:
45 Derived Simple Types, cont’dA regular expression pattern. Example: Element: For Perl-style regular expressions, see https://www.cs.tut.fi/~jkorpela/perl/regexp.html X_876
46 Derived Simple Types, cont’dA union type. Example: Elements:
47 Derived Simple Types, cont’dA list type. Example: Therefore, in the XML document, the holidays element can have a list of date values:
48 Complex Types Four complex types: Complex type Attributes ChildrenText content Text only yes - Element only Empty element Mixed content
49 Complex Types: Text OnlyAttributes and text content, no child elements. In this example, an element of type year_type must have content that is a positive integer. It must also have an era attribute whose value is a string. Use
50 Complex Types: Element OnlyAttributes and child elements, no text content. Example:
51 Complex Types: Element Only, cont’dChoice of child elements Either a social security number, or a username followed by a password.
52 Complex Types: Empty ElementAttributes only, no text content and no child elements. Example:
53 Complex Types: Mixed ContentAttributes, child elements, and text content. Example: The elements are interspersed among the text content. Instead of
54 Attributes Attributes are simple type elements.Example: Attributes are optional by default. Add use="required" to the attribute definition to make the attribute mandatory. Add use="prohibited" to prohibit the attribute.
55 Sample XML Document catalog.xml
56 Sample XML Schema catalog.xsd
57 Sample XML Schema, cont’dcatalog.xsd
58 Complete XML Schema catalog.xsd
59 XQuery 1.0 A query language used to:Select content from an XML data source. Transform the content. Return the content in some format (XML, HTML, ...). Uses XPath expressions. Analogous to SQL for relational database tables. Compact and easy to learn. Does not use the XML syntax.
60 XQuery catalog.xml
61 XQuery Scripts XQuery queries are kept in script files.Convention: Use the .xql suffix in the file name.
62 XQuery Script ExamplesReturn the entire node tree from catalog.xml: The first line is mandatory (and won’t be shown in subsequent examples). xquery version "1.0"; doc("catalog.xml") xquery version "1.0";
63 XQuery Script Examples, cont’dReturn only the articles. Return articles whose titles contain the word “Design”. Return the articles that were written in the year 2003. doc("catalog.xml") //article doc("catalog.xml") //article/title[contains(., "Design")] doc("catalog.xml") "2003")]
64 FLWOR Expressions Pronounced “flower”. XQuery expressions can contain:For (loops) Let (local variables) Where Order Return
65 FLWOR Expression Examplesfor $art in doc("catalog.xml") //article where "2003") return $art example-1.xql
66 FLWOR Expression Examples, cont’dexample-2.xql for $art in doc("catalog.xml") //article let $d := where contains($d, "2003") order by $art/title return ($art/title, $art/author) Note the := Note the parentheses and comma! A return clause can return only one expression.
67 FLWOR Expression Examples, cont’dexample-3.xql
68 FLWOR Expression Examples, cont’ddirectory.xml
69
70 FLWOR Expression Examples, cont’dexample-4.xql
71 FLWOR Expression Examples, cont’dexample-5.xql for $art in doc("catalog.xml") //article for $per in doc("directory.xml") //person let $addr := $per/ where $art/author = $per/name order by $art/title return ($art/title, $art/author, $addr) A join of two XML documents!
72 let $addr := $per/email where $art/author = $per/name example-6.xql
73 User-Defined Functions in XQueryUse the local namespace to prevent name clashes with standard functions. declare function local:splitter($name) { let $first := substring-before($name, ' ') let $last := substring-after ($name, ' ') return (
74 User-Defined Functions in Xquery, cont’dexample-7.xql
75 XML and Relational DatabasesModern RDBM systems provide XML support. For MySQL, see https://dev.mysql.com/doc/refman/5.5/en/load-xml.html
76 XML and Relational Databases, cont’dmysql -u root -psesame --xml school mysql> select * from class;
77 XML and Relational Databases, cont’dNative XML support allows you to load and fetch XML data directly into and out of a relational database. XQuery can directly query relational database tables. If database school contains table Teacher : However, the XQuery standard (as of version 1.0) doesn’t specify how to make a connection to a database. let $teachers := collection("school.Teacher")
78 Assignment #5 Create XML data files and their XML schemas, XPath expressions, and XQuery FLWOR expressions. Use the Oxygen XML editor, or a similar tool. See
79 XSLT XSL = Extensible Stylesheet Language T = TransformationsTransform an XML document to another form: another XML document HTML plain text Uses XPath expressions.
80 XSLT Examples catalog2.xml
81 XSLT Examples, cont’d catalog2.xsd
82 XSLT Examples, cont’d
83 XSLT Examples, cont’d example-1.xslArticles
Make the root node the current node. HTML to output. Article Author
84 The Current Node As the XSLT processor works through an XML document, the current node is the node it is processing. XPath expressions can be relative to the current node. By default, the current node is the node matched by the current template.
85 The Current Node, cont’dWhile executing an xsl:for-each instruction, the current node changes during each iteration to the next node that matches the select attribute. Upon completion of the xsl:for-each instruction, the current node reverts to what it was before the instruction.
86 XSLT Examples, cont’d example-2.xslArticles
Make the root node the current node. Each article node in turn becomes the current node. After the for each loop is done, the current node reverts to the root node.
87 example-3.xsl Sort by title.Articles
Sort by title.
88 example-4.xsl Test the conditions in the given order until the first one evaluates to true.
89 XSLT Examples, cont’d example-5.xsl Within an XSL document,Summary
90 XSLT Examples, cont’d example-6.xsl No “else” part.
91 XSLT Examples, cont’d example-7.xsl
92 XSLT Examples, cont’d example-7.xsl “Call” the template thatArticles
93 XSLT Examples, cont’d example-7.xsl