ITEC325 XPath

XPath
and xpath functions

Originally based on XML Visual Quickstart Guide by Kevin Howard Goldberg, and notes therefrom by Jack Davis (jcdavis@radford.edu).

highlights

xpath is a convenient way to talk about specific nodes within a tree (a "nodeset")

like a file-structure: use / to go down “into” a tag

If starts with /, it's an absolute path; otherwise it's a relative path
(every xpath expression is evaluated within a context; we already saw <xsl:template match='/'> whose context is the root, and xsl:for-each whose body-context will be the node being currently-processed.
Just as in your filesystem, be aware of the difference between absolute and relative paths!

like unix file-system: . and .. mean current and parent node, respectively.
E.g. inside a <xsl:for-each select="/ancient_wonders/wonder/location"> loop, we could refer to <xsl:value-of select="../name".

There are more …obscure naming conventions to xpath: See more details here (optional, for us).

Use […] to filter a node set:
E.g. <xsl:value-of select="/ancient_wonders/wonder[height > 100]" /> or <xsl:value-of select="/ancient_wonders/wonder/name[@lang='en']" />

You can do further refinement after filtering
E.g. <xsl:value-of select="/ancient_wonders/wonder[height > 50]/location" /> or <xsl:value-of select="/ancient_wonders/wonder/[name/@lang='en']/location" /> which is equivalent to <xsl:value-of select="/ancient_wonders/wonder/name[@lang='en']/../location" />

XPath Functions

XSL's xsl:value-of returns the string value of the first node in a node set. With XPath functions, you can perform further operations on that string/data.

video (26m14s)

Note that xsl:value-of's attribute “select” can be any expression, not just a single node (variable): e.g. The monument is <xsl:value-of select="round(height div 3.28)" /> meters tall.
This is the same as Java expressions, which can of course be more than a single variable; they include constants and function-calls as well ¹ (e.g. wage*Math.round(40*weeksWorked)/52); and SQL SELECT statement which can select entire expressions instead of merely columns (SELECT wage*round(40*weeksWorked)/52 FROM …)

overview and official reference. Also handy: www.w3schools.com/xml/xpath_intro.asp.

Logical operators: or, and (and the function not()).

ancient_wonders/wonder[name/@lang='en' or name/@lang='de']

Note that not() is a function (not an operator), so you call it via the parentheses-notation.

Multiplying, Dividing, Adding, Subtracting
Operators -- +, -, *, div (floating-point division (!)), mod (remainder)
Watch out for the “gotcha” of not using “/”; that will be interpreted as part of an xpath (perhaps the start of an absolute path).

numeric comparisons:
= != > >= < <=
These aren't rendering-typos — you literally include the ampersand in the less-than operator! (Remember, XSLT code is itself valid XML):

<xsl:for-each select="ancient_wonders/wonder[height &gt;= 100]">…</xsl:for-each>

XML (Comparisons)
XSLT

Note:
A program reading an XML/HTML file will internally create the tree-data-structure being specified by the flat string of XML. After it has done so, it can find any string-of-text which includes (say) “<” and replace that internally with the actual character “<” — because after tree has been created there is no longer any potential for the < to be confused with an open-tag. That's what's happening with XSLT: Once the tree is created, it can find the “<=” and consider it as “<=” without confusion.

This explains why you can actually sometimes getaway with writing <xsl:if test="year>0">…</xsl:if> because there the quotes are sufficing to let it figure out that the “<” is part of the attribute-value, and not the XML-structure. Either way, you'll end up with “year<=0”.

XML (Math Operations)
XSLT

Counting Nodes, count(nodeSet)
Return the number of items in the provided node-set.
XML (counting nodes)
XSLT

number-to-string conversion: format-number(n,formatStr) (search that page for "Examples")
The formatStr follows the same conventions as java.text.DecimalFormat:
0 - for each digit that should always appear
# - for each digit that should appear when not 0
. - to separate integer part from fractional part
, - to separate groups of digits
() - to surround negative numbers
XML (Format Numbers)
XSLT

Rounding Numbers: ceiling(), floor(), round().

Oddly, there are no “max” or “min” functions!
The hack is to sort, and then select the first (!):

<xsl:for-each select="ancient_wonders/wonder[]/history">
  <xsl:sort select="year_built" order="descending" data-type="number"/>
  <xsl:if test="position()=1">
    …
  </xsl:if>
</xsl:for-each>

concat(str1,str2,…)

substring(src,startIndex1,len)

substring-after(src,target) —return the rest of src following the first occurrence of target. (There is a corresponding substring-before.)
For example, substring-after('Gonzo, the magnificent', ',') returns ' the magnificent' (though of course, most of the time the first argument will probably be a node-select statement like “name[@language='English']” or “muppet/stageName” or “.”).

contains(src, target): Does src contain target as a substring?

starts-with(src, target): Does src start with target?

Translating (mapping) characters: translate(src, fromLetters, toLetters) Replace any character in fromLetters with its corresponding character in toLetters. Example: translate( ., 'ESZaA', '3$2@4').

One thing worth noting/remembering, is how entities work: Consider an .xsl file which contains <xsl:value-of select="translate( 'I <3 ice cream', 'c<e', 'kE=')"/>. This will change every c into k, every < into E, and every e into =; and therefore returns "I E3 ik= kr=am":

At first, you might balk: “but the last two arguments to translate don't have the same length -- "c<e" is 6 characters, while "kE=" is 3 characters.” But that's not true after somebody reads in the .xml file ²; read on….
The XML processor reads your file, and constructs the XML tree. After that, there wouldn't be any confusion if a string contained a "<", because the tree has already been made (there is no more possibility that a "<" could be confused as an open-tag). So it goes through that tree, and if it sees any "<" in a string (leaf of the XML-tree), it literally replaces them with "<", and it's no longer confusable as an open-tag.
So with the above example: after reading in your file, it is left with a xsl:value-of element in the DOM tree, and that element contains an attribute select, whose value is: translate( 'I <3 ice cream', 'c<e', 'kE='). The last two arguments to translate are each length 3.
Now the XSLT processor takes that tree and chugs away, evaluates the xpath expression (here, just a call to translate), and returns 'I E3 ik= kr=am'.

Summing numbers in an entire node-set

<xsl:value-of select="sum(/ancient_wonders/wonder/height)
                        div
                      count(/ancient_wonders/wonder)" />

Example: XSLT computing sum, average
XML

Want to write your own node-processing function? Of course you do! This feature wasn't in the original XSLT 1.0, but was added to 2.0. Alas, no major browsers actually support XSLT 2.0 natively. However, there is apparently a javascript plugin, Saxonica CE, which you can use.

position() — returns the index of the current node, among the node-set which contains it.
XML (Test Position)
XSLT (Test Position)
last() — returns the index of the last node, among the node-set which contains it.

Remember that indices in XSLT/XPATH are 1-based, so last() could also be stated as returning the size of the current node's node-set.

These two functions are a bit odd — unlike most, they don't take any input; they use the the current node as the input. But the weird part is, they also include the context that selected the current node. So even if you're in the body of a for-each loop (and the body is operating on one particular node), the last() function is returning information how many nodes are being iterated over. (More precisely, as the specs put it, position and last return “the context position (resp., size) from the expression evaluation context”).

Note: We said that square-brackets in an XPath filter the node-set on a boolean condition. Note that you may find examples on the web where the “condition” is a number [n] instead of a boolean; this is xslt short-hand ³ for [position()=n].

¹ Expressions in Java don't end in semicolons. It's statements that need to end in a semicolon; statements are themselves built out of keywords, punctuation, and expressions (“assignment statements”, “if-else statements”, “block statements”, etc.). ↩

² This difference between source-file and internalized-representation is reminiscent of how, in Java, "ab\ncd" looks like it has 6 characters in the source-code, but really once the compiler reads the file it's actually a string with just 5 characters. ↩

³ This explanation is a bit backwards from history (originally a number was like an array-lookup, which they then generalized to boolean filters), however the more useful principle to remember is filter, not array-lookup; just be aware of what the shorthand means. ↩

This page licensed CC-BY 4.0 Ian Barland
Page last generated

Please mail any suggestions
(incl. typos, broken links)
to ibarlandradford.edu

XPath and xpath functions

highlights

XPath Functions

XPath
and xpath functions