Welcome to our free XML tutorial. This tutorial is based on Webucator's Introduction to XML Training course.
In this lesson you will learn about XPath in XSLT and how they're used to identify elements, attributes and text within an XML doc.
Lesson Goals
To do anything significant with XSLT, you must work with the XML Path Language (XPath). XPath is a W3C Recommendation that is used for identifying elements, attributes, text and other nodes within an XML document. XPath looks at an XML document as a tree. Each element is a branch that may have branches of its own.
In XSLT, XPath is used to match nodes from the source document for output templates via the match
attribute of the xsl:template
tag.
<xsl:template match="SOME_XPATH">
XPath is also used in the xsl:value-of
tag to specify elements, attributes and text to decide what to output.
<xsl:value-of select="SOME_XPATH"/>
In addition, XPath is used in conditionals and flow control statements, which is covered in the lesson on Flow Control.
XPath expressions are statements used by an XSLT processor to produce a result in the form of one of the following:
The table below explains some common terms used in XPath.
Term | Definition |
---|---|
Context Node |
The starting point for the expression. In XSLT, the context node is often (but not always) determined by the XPath in the match attribute of the |
Current Node |
Changes as an expression is evaluated. The next part of the expression uses the last current node as its context node. |
Context Size |
The number of nodes being evaluated at any point in an expression. |
Proximity Position |
The position of a node relative to other nodes in a node list. The proximity position of the first node in a node list is always one (1). |
Location paths are used to point to and select portions of an XML document. The syntax of a location path is shown below.
axis::node_test[predicate]
The table below explains the parts of the location path.
Term | Definition |
---|---|
axis | Indicates the relationship between the selected node and the context node. |
node test | Provides the name or class of the nodes to reference. |
predicate | Further filters the nodeset. |
We will examine each part of the location path soon, but first let's look at some examples, some of which have been taken as is from the W3C documentation; others have been slightly modified.
child::firstname
selects the firstname
element children of the context nodechild::*
selects all element children of the context nodechild::text()
selects all text node children of the context nodechild::node()
selects all the children of the context node, whatever their node typeattribute::name
selects the name attribute of the context nodeattribute::*
selects all the attributes of the context nodedescendant::firstname
selects the firstname
element descendants of the context nodeancestor::name
selects all name ancestors of the context nodeancestor-or-self::div
selects the div
ancestors of the context node and, if the context node is a div element, the context node as welldescendant-or-self::para
selects the para
element descendants of the context node and, if the context node is a para
element, the context node as wellself::para
selects the context node if it is a para
element, and otherwise selects nothingchild::chapter/descendant::para
selects the para
element descendants of the chapter element children of the context node
child::*/child::para
selects all para
grandchildren of the context node
/
selects the document root (which is always the parent of the document element)
/descendant::para
selects all the para
elements in the same document as the context node/descendant::olist/child::item
selects all the item
elements that have an olist
parent and that are in the same document as the context nodechild::para[position()=1]
selects the first para
child of the context nodechild::para[position()=last()]
selects the last para
child of the context nodechild::para[position()=last()-1]
selects the last but one para
child of the context nodechild::para[position()>1]
selects all the para
children of the context node other than the first para
child of the context nodefollowing-sibling::chapter[position()=1]
selects the next chapter
sibling of the context nodepreceding-sibling::chapter[position()=1]
selects the previous chapter
sibling of the context node/descendant::figure[position()=42]
selects the forty-second figure
element in the document/child::doc/child::chapter[position()=5]/child::section[position()=2]
selects the second section
of the fifth chapter
of the doc
document elementchild::para[attribute::type="warning"]
selects all para
children of the context node that have a type
attribute with value warning
child::para[attribute::type='warning'][position()=5]
selects the fifth para
child of the context node that has a type
attribute with value warning
child::para[position()=5][attribute::type="warning"]
selects the fifth para
child of the context node if that child has a type
attribute with value warning
child::chapter[child::title='Introduction']
selects the chapter
children of the context node that have one or more title
children with string-value equal to Introduction
child::chapter[child::title]
selects the chapter
children of the context node that have one or more title
childrenchild::*[self::chapter and self::appendix]
selects the chapter
and appendix
children of the context nodechild::*[self::chapter or self::appendix][position()=last()]
selects the last chapter
or appendix
child of the context nodeLocation paths are either relative (i.e., starting with the context node) or absolute (i.e., starting with the root node of the XML document. A location path is divided in location steps. Each step consists of an axis followed by a node test. The node test may have predicates, which are used to further specify the node to be accessed. In the next sections, we will examine the different portions of these location steps.
axis::node_test[predicate]
Each location step has a node test, which provides the name or class of the nodes to reference. The processor looks through the nodes at the specified axis and returns a nodeset including all nodes with the name or class specified in the node test. Some examples are shown below:
Example | Description |
---|---|
/
|
Indicates the root, which is one level above the document element. |
FirstName
|
Indicates a FirstName node. Depending on the axis, this could be an element or an attribute. |
text()
|
Indicates a text node. |
comment()
|
Indicates a comment node. |
processing-instruction()
|
Indicates a processing instruction. |
axis::node_test[predicate]
An axis indicates the relationship between the selected node and the context node. Below is a reference table of the available axes.
Axis | Description |
---|---|
child
|
children of the context node |
descendant
|
descendants of the context node |
parent
|
parent of the context node |
ancestor
|
ancestors of the context node |
following-sibling
|
all siblings that follow the context node |
preceding-sibling
|
all siblings that precede the context node |
following
|
all nodes that follow the context node |
preceding
|
all nodes that precede the context node |
attribute
|
attributes of the context node |
namespace
|
namespace nodes of the context node |
self
|
the context node |
descendant-or-self
|
the context node and all its descendants |
ancestor-or-self
|
the context node and all its ancestors |
Some location paths using just the axis and the node test are shown below.
Example | Description |
---|---|
child::FirstName
|
Indicates the FirstName element children of the context node. |
child::*
|
Indicates all element children of the context node. |
child::text()
|
Indicates all text node children of the context node. |
child::node()
|
Indicates all the children of the context node, whatever their node type. Note that attributes are not considered children of elements. |
parent::node()
|
Indicates the parent of the context node regardless of type. |
parent::*
|
Indicates the parent of the context node if that parent is an element (the only other possibility is that the parent is the document root). |
parent::Topic
|
Indicates the parent of the context node if that parent is an element named "Topic". |
attribute::href
|
Indicates the href attribute of the context node. |
attribute::*
|
Indicates all the attributes of the context node. |
Example | Description | |
---|---|---|
descendant::FirstName
|
Indicates the FirstName element descendants of the context node. |
|
ancestor::Topics
|
Indicates all Topics ancestors of the context node. |
|
ancestor-or-self::div
|
Indicates the div ancestors of the context node and, if the context node is a div element, the context node as well. |
|
descendant-or-self::List
|
Indicates the List element descendents of the context node, and if the context node is a List element, the context node as well. |
axis::node_test[predicate]
Predicates are used to filter node sets selected in the node test. Predicates are placed in square brackets following a node test. Multiple steps in a location path may have predicates.
Predicates can be relatively simple, such as child::para[position()=1]
, which returns the first para
child of the context node. They can also be fairly complicated, such as:
/child::doc/child::chapter[position()=5]/child::section[position()=2]
which returns the second section
of the fifth chapter
of the doc
document element.
There are several example XSLT files in XPath/Demos that illustrate how XPath works. Take a moment to run through these examples by transforming XPath/Demos/Beatles.xml against each. They include:
In this exercise, you will practice using XPath by modifying the XSLT used to transform the XML document below.
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="XPaths.xsl"?> <BusinessLetter> <Head> <SendDate>November 29, 2005</SendDate> <Recipient> <Name Title="Mr."> <FirstName>Joshua</FirstName> <LastName>Lockwood</LastName> </Name> <Company>Lockwood & Lockwood</Company> <Address> <Street>291 Broadway Ave.</Street> <City>New York</City> <State>NY</State> <Zip>10007</Zip> <Country>United States</Country> </Address> </Recipient> </Head> <Body> <List> <Heading> Along with this letter, I have enclosed the following items: </Heading> <ListItem> two original, execution copies of the Webucator Master Services Agreement </ListItem> <ListItem> two original, execution copies of the Webucator Premier Support for Developers Services Description between Lockwood & Lockwood and Webucator, Inc. </ListItem> </List> <Para>Please sign and return all four original, execution copies to me at your earliest convenience. Upon receipt of the executed copies, we will immediately return a fully executed, original copy of both agreements to you.</Para> <Para> Please send all four original execution copies to my attention as follows: <Person> <Name> <FirstName>Bill</FirstName> <LastName>Smith</LastName> </Name> <Address> <Company>Webucator, Inc.</Company> <Street>4933 Jamesville Rd.</Street> <City>Jamesville</City> <State>NY</State> <Zip>13078</Zip> <Country>USA</Country> </Address> </Person> </Para> <Para>If you have any questions, feel free to call me at <Phone>800-555-1000 x123</Phone> or e-mail me at <Email>bsmith@webucator.com</Email>.</Para> </Body> <Foot> <Closing> <Name> <FirstName>Bill</FirstName> <LastName>Smith</LastName> </Name> <JobTitle>VP of Operations</JobTitle> </Closing> </Foot> </BusinessLetter>
Please follow these steps.
xsl:template
s, one matching Head
, Body
, and Foot
. In each template is a comment showing what the goal output is. You will use xsl:value-of
tags and XPath to create this output from the XML file shown above.xsl:text
tag.<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes"/> <xsl:template match="/"> <XPathTests> <xsl:apply-templates /> </XPathTests> </xsl:template> <xsl:template match="Head"> <XPathTest> <!--OUTPUT: Mr. Joshua Lockwood is from the United States--> <xsl:value-of select="child::Recipient/child::Name/attribute::Title"/> <xsl:text> </xsl:text> <xsl:value-of select="child::Recipient/child::Name/child::FirstName"/> <xsl:text> </xsl:text> <xsl:value-of select="child::Recipient/child::Name/child::LastName"/> is from the <xsl:value-of select="child::Recipient/child::Address/child::Country"/> </XPathTest> </xsl:template> <xsl:template match="Body"> <XPathTest> <!--OUTPUT: Bill Smith works at Webucator, Inc. His email is bsmith@webucator.com. If you have any questions, feel free to call me at 800-555-1000 x123.--> <xsl:value-of select="descendant::FirstName"/> <xsl:text> </xsl:text> <xsl:value-of select="descendant::LastName"/> works at <xsl:value-of select="descendant::Company"/> His email is <xsl:value-of select="child::Para/child::Email"/> <xsl:text> </xsl:text> <xsl:value-of select="child::Para[position() = last()]/text()[position()=1]"/> <xsl:value-of select="child::Para/child::Phone"/>. </XPathTest> </xsl:template> <xsl:template match="Foot"> <XPathTest> <!--OUTPUT: VP of Operations: Smith, Bill --> <xsl:value-of select="descendant::JobTitle"/>: <xsl:value-of select="descendant::LastName"/>, <xsl:value-of select="descendant::FirstName"/> </XPathTest> </xsl:template> </xsl:stylesheet>
The XPath syntax can sometimes be lengthy. Thankfully, there is an abbreviated syntax that is much more commonly used. The table below shows some of these abbreviations.
Axis | Description |
---|---|
child::
|
|
.
|
self::node()
|
..
|
parent::node()
|
@
|
attribute::
|
.//
|
./descendant-or-self::node()/
|
//
|
descendant-or-self::node()/
|
*
|
all child elements of the context node
|
@*
|
all attributes of the context node
|
[n]
|
[position() = n]
|
The child::
axis is the default axis, so it can go unnamed; hence, the empty cell in the table above.
Long Form | Abbreviated Syntax |
---|---|
child::firstname
|
firstname
|
child::*
|
*
|
child::text()
|
text()
|
attribute::name
|
@name
|
attribute::*
|
@*
|
descendant-or-self::firstname
|
.//firstname
|
child::chapter/descendant::para
|
chapter//para
|
child::*/child::para
|
*/para
|
/descendant::para
|
//para
|
child::para[position()=1]
|
para[1]
|
child::para[attribute::type="warning"]
|
para[@type="warning"]
|
child::*[self::chapter or self::appendix]
|
*[name()='chapter' or name()='appendix']
|
This exercise is identical to the previous exercise except that you will be using the abbreviated syntax of XPath.
Please follow these steps.
xsl:templates
, one matching Head
, Body
, and Foot
. In each template is a comment showing what the goal output is. You will use xsl:value-of
tags and XPath to create this output.xsl:text
tag.<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes"/> <xsl:template match="/"> <XPathTests> <xsl:apply-templates /> </XPathTests> </xsl:template> <xsl:template match="Head"> <XPathTest> <!--OUTPUT: Mr. Joshua Lockwood is from the United States--> <xsl:value-of select="Recipient/Name/@Title"/> <xsl:text> </xsl:text> <xsl:value-of select="Recipient/Name/FirstName"/> <xsl:text> </xsl:text> <xsl:value-of select="Recipient/Name/LastName"/> is from the <xsl:value-of select="Recipient/Address/Country"/> </XPathTest> </xsl:template> <xsl:template match="Body"> <XPathTest> <!--OUTPUT: Bill Smith works at Webucator, Inc. His email is bsmith@webucator.com. If you have any questions, feel free to call me at 800-555-1000 x123.--> <xsl:value-of select=".//FirstName"/> <xsl:text> </xsl:text> <xsl:value-of select=".//LastName"/> works at <xsl:value-of select=".//Company"/> His email is <xsl:value-of select="Para/Email"/> <xsl:text> </xsl:text><xsl:value-of select="Para[last()]/text()[1]"/> <xsl:value-of select="Para/Phone"/>. </XPathTest> </xsl:template> <xsl:template match="Foot"> <XPathTest> <!--OUTPUT: VP of Operations: Smith, Bill --> <xsl:value-of select=".//JobTitle"/>: <xsl:value-of select=".//LastName"/>, <xsl:value-of select=".//FirstName"/> </XPathTest> </xsl:template> </xsl:stylesheet>
Functions are often used within predicates to help identify a node or node set or to find out information about a node or node set. Below are reference tables showing some of the more common core XPath functions.
Function | Description |
---|---|
last()
|
Returns the number of the number of items in the selected node set. |
position()
|
Returns the position of the context node in the selected node set. |
count()
|
Takes a location path as an argument and returns the number of nodes in that location path. |
id()
|
Takes an id as an argument and returns the node that has that id. |
Function | Description |
---|---|
starts-with()
|
Takes a string and substring as arguments. Returns true if the string begins with the substring. Otherwise, returns false. |
contains()
|
Takes a string and substring as arguments. Returns true if the string contains the substring. Otherwise, returns false. |
substring-before(string, substring)
|
Returns the portion of the string to the left of the first occurrence of the substring. |
substring-after(string, substring)
|
Returns the portion of the string to the right of the first occurrence of the substring. |
substring()
|
Takes a string, start position and length as arguments. Returns the substring of length characters beginning with the character at start position. |
string-length()
|
Takes a string as an argument and returns its length. |
name()
|
Returns the name of an element. |
text()
|
Returns the text child nodes of an element. |
Function | Description |
boolean()
|
Takes an object as an argument. Returns true if: the object is a number greater than zero, the object is a non-empty node-set, the object is a string with at least one character. |
Function | Description |
---|---|
sum()
|
Takes a node-set as an argument and returns the sum of of the string values of the node-set. |
ceiling()
|
Takes a number as an argument and returns the rounded-up value. |
floor()
|
Takes a number as an argument and returns the rounded-down value. |
round()
|
Takes a number as an argument and returns the rounded value. |
The table below shows the XPath operators.
Operator | Description |
---|---|
and | Boolean AND |
or | Boolean OR |
= | Equals |
!= | Not equal |
< | Less than |
<= | Less than or equal |
> | Greater than |
>= | Greater than or equal |
+ | Addition |
- | Subtraction |
* | Multiplication |
div | Division |
mod | Modulus |
The sample below shows how some operators and functions are used in practice.
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html"/> <xsl:template match="/"> <h1>Functions and Operators</h1> <h2>count()</h2> <code>count(beatles/beatle):</code> <b> <xsl:value-of select="count(beatles/beatle)"/> </b> <h2>contains()</h2> <code>contains(//beatle[last()]/@link,'webucator'):</code> <b> <xsl:value-of select="contains(//beatle[last()]/@link,'webucator')"/> </b><br/> <code>contains(//beatle[last()]/@link,'ringostarr'):</code> <b> <xsl:value-of select="contains(//beatle[last()]/@link,'ringostarr')"/> </b> <h2>=</h2> <code>beatles/beatle[ @real = 'no' ]//firstname:</code> <b> <xsl:value-of select="beatles/beatle[ @real = 'no' ]//firstname"/> </b> <h2>!=</h2> <code>beatles/beatle[ @real != 'no' ]//firstname:</code> <b> <xsl:value-of select="beatles/beatle[ @real != 'no' ]//firstname"/> </b> <h2>not()</h2> <code>beatles/beatle[ not(@real) ][2]//firstname:</code> <b> <xsl:value-of select="beatles/beatle[ not(@real) ][2]//firstname"/> </b> <h2>last()</h2> <code>beatles/beatle[ not(@real) ][last()]//firstname:</code> <b> <xsl:value-of select="beatles/beatle[ not(@real) ][last()]//firstname"/> </b> <h2>not() & =</h2> <code>beatles/beatle[ not(@real='no') ][2]//firstname:</code> <b> <xsl:value-of select="beatles/beatle[ not(@real='no') ][2]//firstname"/> </b> <h2>not() & = & last()</h2> <code>beatles/beatle[ not(@real='no') ][last()]//firstname:</code> <b> <xsl:value-of select="beatles/beatle[ not(@real='no') ][last()]//firstname"/> </b> </xsl:template> </xsl:stylesheet>
When XPath/Demos/BeatlesFunctions.xml, which has the same XML as
XSLTBasics/Demos/Beatles.xml
, is transformed against XPath/Demos/BeatlesFunctions.xsl and viewed in a browser, the output looks like this:
In this exercise, you will practice using XPath functions.
xsl:value-of
elements. For each, replace the text XPATH
in the select attribute with an actual XPath according the instructions in the comments.<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes"/> <xsl:template match="/"> <XPathTests> <!--Output the value of the FirstName child of the first Name element that doesn't have a Title attribute--> <XPathTest> <xsl:value-of select="//Name[not(@Title)]/FirstName"/> </XPathTest> <!--Output the street number of Webucator Inc's street address (i.e, 4933)--> <XPathTest> <xsl:value-of select="substring-before(//Para/Person/Address/Street,' ')"/> </XPathTest> <!--Output the paragraph text of the paragraph that contains Bill Smith's email address--> <XPathTest> <xsl:value-of select="//Para[Email='bsmith@webucator.com']"/> </XPathTest> <!--Output the number of elements contained in Joshua Lockwood's Address--> <XPathTest> <xsl:value-of select="count(//Recipient/Address/*)"/> </XPathTest> <!--Output the number of elements the contain the word "Lockwood" (should be 3)--> <XPathTest> <xsl:value-of select="count(//*[contains(text(),'Lockwood')])"/> </XPathTest> </XPathTests> </xsl:template> </xsl:stylesheet>