Query Languages for XML

1 Query Languages for XMLXPath ...
Author: Emery Cummings
0 downloads 3 Views

1 Query Languages for XMLXPath

2 The XPath Data Model Given an XML document, XPath operates on it and produces values that are sequences of items. An item is either: A value of primitive type, e.g., integer or string; A node (defined next).

3 Principal Kinds of NodesDocuments represent entire XML documents. Local path name or a URL. Elements are pieces of a document consisting of some opening tag, its matching closing tag (if any), and everything in between. Attributes are names that are given values inside opening tags.

4 Document Nodes Formed by doc(filename)filename can be a local name or a URL. Example: doc(“univ.xml”) or doc(“/mydir/univ.xml”) All XPath queries refer to a doc node, either explicitly or implicitly. Example: key definitions in XML Schema have XPath expressions that refer to the document described by the schema.

5 Running Example ]>

6 Example Document An element node James Bond 80 98 DB An attribute node Document node is all of this, plus the header ( ).

7 Nodes as Semistructured Datauniv.xml UNIV sno = ”007” cno = ”CS123” STUDENT COURSE theCNO = ”CS456” theCNO = ”CS123” MARK DB NAME MARK 98 James Bond 80

8 Paths in XML Documents XPath is a language for describing paths in XML documents. The result of the described path is a sequence of items.

9 Path Expressions Simple path expressions are sequences of slashes (/) and tags, starting with /. Example: /UNIV/STUDENT/MARK Construct the result by starting with just the doc node and processing each tag in turn from the left.

10 Evaluating a Path ExpressionAssume the first tag is the root. Processing the doc node by this tag results in a sequence consisting of only the root element. e.g., /UNIV Suppose we have obtained a sequence of items from processing the previous tags, and the next tag is X. For each item that is an element node, replace the element by all its subelements with tag X.

11 Example: /UNIV One item, the UNIV element James Bond 80 98 DB

12 Example: /UNIV/STUDENT James Bond 80 98 DB This STUDENT element followed by all the other STUDENT elements

13 Example: /UNIV/STUDENT/MARK James Bond 80 98 DB These MARK elements followed by the MARK elements of all the other STUDENTs.

14 Relative Path We can use XPath expressions that are relative to the current node or sequence of nodes. Do not start with /. Example If we have arrived at node /UNIV, then we can use relative path STUDENT/NAME or COURSE to describe its subelements. Lu Chaojun, SJTU

15 Attributes in Paths Instead of going to subelements with a given tag, you can go to an attribute of the elements you already have. An attribute is indicated by in front of its name.

16 Evaluating AttributesWhen a path expression ends in an attribute, the result is typically a sequence of values of primitive type.

17 Example: /UNIV/STUDENT/MARK/@theCNO James Bond 80 98 DB These attributes contribute ”CS123” ”CS456” to the result, followed by other theCNO values.

18 Paths that Begin AnywhereIf the path begins with //X, then the first step can begin at the root or any subelement of the root, as long as the tag is X. In fact, //X can appear anywhere in a path. e.g., /UNIV//NAME // is the shorthand of a kind of axis. (see next slide)

19 Axes: Modes of NavigationIn general, path expressions allow us to start at the root and execute steps to find a sequence of nodes. At each step, we may follow any one of several axes. The default axis is child:: --- go to all the children of the current set of nodes. Shorthand: /

20 Example: Axes /UNIV/STUDENT is really shorthand for /child::UNIV/child::STUDENT. @ is really shorthand for the attribute:: axis. Thus, is shorthand for /child::UNIV/child::STUDENT/attribute::sno

21 More Axes Some other useful axes are:parent:: = parent(s) of the current node(s) Shorthand: .. self Shorthand: the dot descendant-or-self:: = the current node(s) and all descendants Shorthand: // ancestor, ancestor-or-self, next-sibling, etc.

22 Wildcard * A star (*) in place of a tag represents “any tag”.Example: /*/*/NAME represents all NAME elements at the third level of nesting. @* represents “any attribute”. Example:

23 Example: /UNIV/* James Bond 80 98 DB This STUDENT element, all other STUDENT elements, the COURSE element, all other COURSE elements

24 Selection Conditions A condition inside […] may follow a tag.If so, then only paths that have that tag and also satisfy the condition are included in the result of a path expression. Sequence comparisons have an implied “there exists” sense: two sequences are related if any pair of items, one from each sequence, are related by the given comparison operator.

25 Example: Selection Condition/UNIV/STUDENT/MARK[. < 90] James Bond 80 98 DB The condition that the MARK be < 90 makes this but not the CS123 mark part of the result.

26 Example: Attribute in Selection= ”CS123”] James Bond 80 98 DB Now, this MARK element is selected, along with any other marks for CS123.

27 Other Forms of ConditionsHere are some useful forms of conditions: X[i] = true for ith child of its parent X[T] = true for X having subelement with tag T X[A] = true for X having attribute A Lu Chaojun, SJTU

28 Query Languages for XMLXQuery

29 XQuery XQuery extends XPath to a query language that has power similar to SQL. Uses the same sequence-of-items data model. XQuery is an expression/functional language. Any XQuery expression can be an argument of any other XQuery expression. Like relational algebra

30 More About Item SequencesXQuery will sometimes form sequences of sequences. All sequences are flattened. Example: (1 2 () (3 4)) = ( ). Empty sequence

31 FLWR Expressions Zero or more for and/or let clauses.Then an optional where clause. Exactly one return clause.

32 Semantics of FLWR ExpressionsEach for creates a loop. let produces only a local definition. At each iteration of the nested loops, if any, evaluate the where clause. If the where clause returns TRUE, invoke the return clause, and append its value to the output.

33 for Clauses for var in exp, ... Variables begin with $.A for-variable takes on each item in the sequence denoted by the expression, in turn. Whatever follows this for is executed once for each value of the variable.

34 Example: for “Expand the en- closed string by replacing variables and path exps. by their values.” for $c in return {$c} $c ranges over the cno attributes of all courses in our example document. Result is a sequence of CNO elements: CS123 CS456 . . .

35 Use of Braces When a variable name like $x, or an expression, could be text, we need to surround it by braces to avoid having it interpreted literally. Example: $x is an A-element with value ”$x”. {$x} is correct. But return $x is unambiguous. You cannot return an untagged string without quoting it, as return ”$x”.

36 let Clauses let var := exp, ...Value of the variable becomes the sequence of items defined by the expression. Note let does not cause iteration; for does.

37 Example: let Returns one element with all the course numbers, like:let $d := document(”univ.xml”) let $c := return {$c} Returns one element with all the course numbers, like: CS123

38 order by Clauses FLWR is really FLWOR: an order-by clause can precede the return. Form: order by With optional ascending or descending. The expression is evaluated for each assignment to variables. Determines placement in output sequence.

39 Example: order by List all prices for Bud, lowest first.let $d := document(”univ.xml”) for $p in order by $p return $p Generates bindings for $p to MARK elements. Order those bindings by the values inside the elements (auto- matic coersion). Each binding is evaluated for the output. The result is a sequence of MARK elements.

40 Predicates Normally, conditions imply existential quantification.e.g., for two sequences of items to be equal, we have only to find any pair of items, one from each side, that equate.

41 Strict Comparisons To require that the things being compared are sequences of only one element, use comparison operators: eq, ne, lt, le, gt, ge. Example: $x/NAME eq ”James Bond” is true if somebody is the only person named “James Bond”.

42 Boolean Values in XQueryThe effective boolean value (EBV) of an expression is: The actual value if the expression is of type boolean. FALSE if the expression evaluates to 0, ”” [the empty string], or () [the empty sequence]. TRUE otherwise.

43 Comparison of Elem. and ValuesWhen an element is compared to a primitive value, the element is treated as its value, if that value is atomic. Example: eq ”80” is true if 007 get 80 for CS123.

44 Comparison of Two ElementsIt is insufficient that two elements look alike. Example: eq is false. For elements to be equal, they must be the same, physically, in the implied document.

45 Getting Data From ElementsSuppose we want to compare the values of elements, rather than their location in documents. To extract just the value (e.g., the mark itself) from an element E, use data(E).

46 Eliminating DuplicatesUse function distinct-values applied to a sequence. Subtlety: this function strips tags away from elements and compares the string values. But it doesn’t restore the tags in the result. Example return distinct-values( let $d= doc(”univ.xml”) return $d/UNIV/STUDENT/MARK)

47 Quantifier Expressionssome $x in E1 satisfies E2 Evaluate the sequence E1. Let $x (any variable) be each item in the sequence, and evaluate E2. Return TRUE if E2 has EBV TRUE for at least one $x. Analogously: every $x in E1 satisfies E2

48 Aggregations Take sequence as argument, and return count, sum, max, etc. Example let $d := doc(“univ.xml”) for $s in $d/UNIV/STUDENT where count($s/MARK) > 100 return $s Lu Chaojun, SJTU

49 Branching Expressionsif (E1) then E2 else E3 is evaluated by: Compute the EBV of E1. If true, the result is E2; else the result is E3. Example eq ”007”) then $x/NAME else ()

50 Query Languages for XMLXSLT

51 XSLT XSLT (Extensible Stylesheet Language – Transformation) is another language to process XML documents. Originally intended as a presentation language: transform XML into an HTML page that could be displayed. It can also extract data from XML or transform XML -> XML, thus serving as a query language.

52 XSLT Programs Like XML Schema, an XSLT program is itself an XML document. Usually called a stylesheet. XSLT has a special namespace of tags, usually indicated by xsl:. “http://www.w3.org/1999/XSL/Transform”>

53 Templates A stylesheet has one or more templates.A template describes a set of elements (of the document being processed) and what should be done with them. The form: Attribute match gives an XPath expression describing how to find the nodes to which the template applies.

54 Example Template matches only the root. This is a document. Output of the template is a HTML page.

55 Obtaining Values from XMLThe output usually depend on the data of the input XML. Use xsl:value-of to extract data. Example:

56 Recursive Use of TemplatesAn XSLT document usually contains many templates. Start by finding the first one that applies to the root. Any template can have within it , which causes the template-matching to apply recursively from the current node.

57 Apply-Templates Attribute select gives an XPath expression describing the subelements to which we apply templates. Example: says to follow all paths tagged A, B from the current node and apply all templates there.

58 Example: Apply-Templates

59 Iteration Loop within a template: executes the body of the for-each at each child of the current node that is reached by the path.

60 Conditionals Branching executes the body if and only if the boolean expression is true. Lu Chaojun, SJTU

61 End