1 Product Training ProgramXML / XPath / XSLT Primer
2 XML, XSLT, and XPath
3 What are XML, XSLT, and XPath?XML is the Extensible Markup Language. It is a specification for defining a document structure for organizing information. A specific XML document is said to have a Schema, or a defined structure that it is expected to follow. Specific schemas depend entirely on the intended use case of an individual XML document. XSLT is the Extensible Stylesheet Language. It is used to build an instruction stylesheet that tells a transformation engine how to convert one XML document into another XML document. This is useful because it can allow data from the first XML document to be placed within another XML document of a completely different structure. XPath, or XML-Path, is a mechanism for defining a path to a node within an XML document. XSLT uses XPath to specify the location of specific data in the source XML document when assigning values to nodes in the target XML document.
4 Coding Not Required, Understanding IsPilotFish’s Data Mapper abstracts away the heavy lifting of memorizing a programming language and writing complex syntax. It does this through a provided graphical tool that allows the XSLT documents to be built through simple drag-and-drop operations. While coding is not required, an understanding of the XSLT concepts described here is, in order to know how to build the structures in the graphical tool. This guide will go in depth into how XSLT is coded. Every coding concept covered here has a graphical tool available in the Data Mapper to perform the same functionality.
5 Sample Document
6 Sample Document All upcoming slides will reference the previous sample XML document.
7 XML
8 XML Document StructureThe data in XML is organized into a series of Nodes arranged in a clear hierarchy, with each node having a single parent, and an unlimited number of possible siblings and children. There are three main types of Nodes in XML: Element nodes are the “primary” nodes, represented by angle brackets (<>). Example:
9 XML Namespaces Because of the potential for element naming clashes (ie, multiple elements named FirstName), XML documents frequently use namespaces. A namespace is a unique string of text that the creator can guarantee no one else will ever use. To ensure this guarantee, web URLs to a site that the creator owns are generally used because of their guaranteed uniqueness. Namespaces are generally assigned at the root element, using the xmlns (XML Namespace) declaration. Example: xmlns=http://www.namespace.com/mynamespace If a single namespace is being used, it can be declared the way shown above. Multiple namespaces will use prefixes. A prefix is a short bit of text that is placed before each element. It is treated as a stand-in for the full namespace, so each element with that prefix will be qualified by the full namespace by the parser. Namespace prefixes don’t have to have the same uniqueness guarantee as the namespace itself, it just needs to be unique within the single document it is being used within. Declaring a namespace with a prefix in the root element: xmlns:fish=http://www.namespace.com/mynamespace Using a namespace with a prefix on elements within the document:
10 XPath
11 XPath Axes XPath locations are typically composes of axes (plural of “axis”). These are similar to file system paths. Expressions are always evaluated against a context, or “current node”, so the axes are always relative to that context. Axes are written with a series of symbols to represent the flow of the path through the XML document. They include: / - The “child-of” symbol. This divides the name of a parent element from that of a child element. Example: Fish/FirstName This symbol, if used as the first character of an XPath expression with no preceding name, refers to the root of the entire document. Example: /Root/Fish/FirstName // - The descendant-or-self axis. This means that the node that follows this will be expected to be either the current node, or a descendant of the current node, an unlimited number of levels deep. Example: //FirstName . – A single dot is shorthand for the current node. .. – Two dots is shorthand for the parent node. It can be chained together to move in reverse up the document. Example: ../../Root/Fish/FirstName @ - The “at” symbol represents an attribute node, and is always followed by the attribute’s name. Example:
12 XPath Axes (Continued)XPath includes additional axes with readable names that perform both similar and more advanced functions than the symbolic axes. These axes include: ancestor, ancestor-or-self, attribute, child, descendant, descendant-or-self, following, following-sibling, namespace, parent, preceding, preceding-sibling, and self.
13 XPath Namespaces XPath needs to be aware of any namespaces being used in the XML document in order to find nodes properly. Even if the XML document doesn’t use a namespace prefix, it is recommended to use one with XPath. Generally, XPath evaluation engines (including XSLT) provide the means to declare namespaces independent of the document being evaluated. Example of XPath with prefix: /fish:Root/fish:Fish/fish:FirstName If the namespace isn’t known, or can otherwise be ignored, XPath has a function that searches purely by the local part of the name. Example: local-name(FirstName)
14 XPath Functions XPath implementations contain a set of useful functions, which can perform more advanced operations. Function calls in XPath are similar to most programming languages: functionName(parameter, parameter, etc…) Functions can also return a result of a specific type, depending on the function. One common function is the substring() function, which has this signature: String substring(expression, index, [length]). Breakdown This function returns a value of type String The function is declared with the name “substring” It requires two arguments, the XPath expression for for the value to substring, and the index to start the substring at. There is a third, optional argument, length, which determines how long the substring should be. It defaults to the end of the original String. Using our sample document, if the context node is the first Fish element, and we want to grab the “Gold” part of the Type element, we would write the function like this: substring(Type,1,4)
15 XPath Functions (Continued)There are many XPath functions, and they all have drag-and-drop components in the PilotFish Data Mapper to access them. For more information on XPath functions, there are many online resources to learn more about them.
16 XPath Predicates XPath expressions can contain the equivalent of a “where” clause, called a predicate. Predicates can be used anywhere in an XPath expression to express a condition at points where the expression could return more than one possible result. The syntax for a predicate is: axis[expression], where “expression” is implicitly evaluated and should return a boolean (true/false) value. Expressions which are numeric represent indices, and are used when an expression returns more than one node. This example selects the second Fish element: /Root/Fish[2] Otherwise, expressions include a boolean test, usually involving attributes or child elements of the element being tested. This example selects the Fish element with the Type “Halibut”: /Root/Fish[Type = ‘Halibut’] Expressions can be joined using “and” or “or”.
17 XSLT
18 XSLT Basics XSLT builds a stylesheet that contains a series of instructions that, once executed, build a standard XML document. XSLT stylesheets are XML documents, with special instruction elements that use the XSLT namespace: “http://www.w3.org/1999/XSL/Transform”. XSLT uses XPath for retrieving data from the source XML document, while it directly declares all elements in the target XML document. XPath is never used by XSLT for operations involving the target document. All XSLT operations revolve around Templates. XSLT instruction elements are processed and executed by an XSLT transformation engine. These engines take in the stylesheet and source XML document, and process them to produce the target XML document. Common XSLT transformation engines include Xalan and Saxon.
19 XSLT Declaration & NamespacesXSLT stylesheets are declared with a common root element and the default namespace:
20 XSLT Versions XSLT, in its lifespan, has seen 3 major versions released. XSLT 3.0 is proprietary, requiring a paid license from Saxonica. PilotFish does not currently license XSLT 3.0 with its product, however XSLT 3.0 is supported and can be enabled if the client has a license. XSLT 1.0 and 2.0 are both fully supported in the PilotFish product. XSLT 2.0 tends to have significantly higher performance, as well as newer and more advanced features. However, PilotFish has a number of powerful XSLT extensions built into the Data Mapper that depend on XSLT 1.0. When specific PilotFish features are required, using XSLT 1.0 is recommended. Otherwise, using XSLT 2.0 is always recommended. There are very few legacy PilotFish features that depend on XSLT 1.0, and they are all shortcuts to make complex XSLT operations easier, so they can still be done in XSLT 2.0 with a little more work.
21 XSLT Templates Templates are the equivalent of XSLT’s engine. They are executed by matching on an XPath expression. Example:
22 XSLT Outputs XSLT always outputs the target XML document, specified by the logic within the stylesheet. Target XML elements and attributes can be hardcoded as if writing a normal XML document:
23 XSLT Conditions Conditional logic in XSLT is done one of two ways.A simple
24 XSLT Iteration Iteration is the most common form of flow-control operation done in XSLT. It is always done by iterating over collections of elements in the source XML document. The
25 XSLT Variables XSLT allows values to be stored in variables. A variable can either be given a hardcoded text value or assigned a value from the source XML document using an XPath expression. One a variable has a value assigned, it can NOT be re-assigned. There is an exception to this rule. If the original declaration of the variable exists in a scope that repeats, such as a “for-each”, it will be re-declared and re-assigned each time the loop iterates. XSLT variables have implicit types, generally either a String, Node, or NodeSet, depending on the expression used to assign them their value. Certain newer versions of XSLT allow for explicitly assigning types to variables, but this is not always the case. XSLT variable syntax:
26 XSLT Parameters XSLT parameters are similar to variables, however they are values provided externally from the stylesheet. Parameters must be declared at the top of the stylesheet, and are generally not assigned values within the XSLT, as those values are provided by the XSLT transformation engine. Like variables, XSLT parameters cannot have their values re-assigned. However, unlike variables, there are no exceptions to this rule. XSLT parameter declaration syntax:
27 Plain Text When assigning values to target elements, XSLT also has the ability to assign plain text. There are two ways of doing this. Simply write the text into the element as if composing a normal XML document:
28 Escaping XMLcontrol characters such and > (to name a few) must always be escaped when being used literally in any form of text content. For example, > becomes > Please consult online references for more information about character escaping in XML.
29 Advanced XSLT Capabilities
30 Java Callouts XSLT supports calling out to external applications that are available to the transformation engine. In the case of PilotFish, this means the broader Java code in the application. Any Java class on the PilotFish classpath, including custom code provided by customers through PilotFish’s extension capabilities, can be used via Java callouts. The XSLT syntax for Java Callouts has two requirements: a namespace declaration to tell the XSLT engine that this functionality is being used, and a declaration for the Java class being accessed. The XSLT namespace for the engine varies based on the engine. The prefix set by the engine’s namespace must be used when declaring the Java class. Xalan: xmlns:xalan=“http://xml.apache.org/xalan” Saxon: xmlns:java=http://saxon.sf.net/java-type The referenced Java class must also be declared as if it were a namespace with a prefix: xmlns:td=“java:com.pilotfish.eip.TransactionData”
31 Java Callouts (Continued)Java callouts can have instances declared as variables:
32 Identity Transforms An Identity Transform is a template that produces an output XML document that is identical to the source XML document. However, it is possible to specify specific changes to be made as this process executes. This is very useful for when a transformation needs to produce an output that is nearly identical to the original, with only a few modifications necessary. The basic declaration of the Identity Transform is in the form of a recursive template that matches on everything, and then invokes a “copy” operation on everything: The important part is that, within the ”copy” element, there is the “apply-templates” element. This tells the XSLT engine that every single item being copied should be checked against any other templates that exist to see if there is a match. To change specific items, more templates are added that match on only the specific things that should be changed. Example:
33 Keys and the Muenchian MethodIn some cases, an XSLT transformation will require accessing a collection of source XML elements based on some form of grouping. XSLT provides an insanely efficient tool to do this, using the “key” function. The first part is to declare an XSLT Key. This is done with the “key” element:
34 Keys and the Muenchian Method (Continued)The best use for Keys is something called the Muenchian Method. This is an XSLT pattern that uses Keys and one additional function to allow for iterating over elements grouped by Keys. The additional function is the ”generate-id” function. This function generates a unique ID for any node passed into it. This ID is both unique and consistent, meaning the same node will always produce the same ID. The Muenchian expression for this kind of iteration is: