-
Notifications
You must be signed in to change notification settings - Fork 27
XPath: String Functions on Georg Forster Voyage Narrative
In the Georg Forster file, let's find/do the following:
contains(., ' ')
: (literal text searches only)
Find all the date elements that contain a literal square bracket character: [
//date[contains(.,"[")]
The contains() function always requires two arguments in the format haystack, needle (where you're looking, what you're looking for). The second argument should be literal characters or a Regex expression and is contained by quotes ""
or ''
.
matches(., ' ')
: (regex patterns, which may also include literal text)
Find all the date elements that contain 4 digits together (like a 4-digit year)
//date[matches(.,"\d{4}")]
Find all the persName elements that start with a lower-case letter (note how caret and dollar-sign work in XML nodes):
//persName[matches(.,"^[a-z]")]
normalize-space()
: (remove extra white space from output in reading nodes)
//persName[matches(.,"^[a-z]")] ! normalize-space()
or
normalize-space(//persName[matches(.,"^[a-z]")])
substring-before()
: (retrieves just a piece that comes before a literal string of text)
Find all the persName elements that contain an 's
, and then return the substring-before it:
//persName[contains(.,"'s")] ! substring-before(.,"'s") ! normalize-space() => distinct-values()
substring-after()
: (like the above, but retrieves a piece that comes just after a literal string)
tokenize()
: (breaks apart a string into pieces based on a regex. We often use this with a position() function to grab just the piece we want):
Take the longitude readings in Forster, normalize spaces. Then tokenize on a white space of any kind, and take the SECOND token, then filter to return ONLY the tokens that hold one or more digits:
//geo[@select='lon'] ! normalize-space() ! tokenize(., "\s")[2][matches(., "\d+")]
**lower-case()
and upper-case
: (takes a string and converts it to all upper-case or all lower-case)
Lower-case all the latitude readings in Forster:
//placeName ! normalize-space() ! upper-case(.)
**string-join()
(joins together a multiple sequence of strings with a separator)
String together all the placeNames in the document. Maybe let's normalize the spaces, first.
string-join(//placeName ! normalize-space(), ", ")
or
//placeName ! normalize-space() => string-join(", ")
**concat()
(joins together specific results in a one-to-one way, as many arguments as you have single pieces to put together)
Patch together the first persName and the first placeName in each paragraph that has these.
//p[placeName and persName] ! concat('first place: ', placeName[1], ' first person: ', persName[1]) ! normalize-space()
or
//p[placeName and persName] ! normalize-space(concat('first place: ', placeName[1], ' first person: ', persName[1]))
More Reading/Reference/Examples:
- For more detailed discussion of XPath string functions, see our extended wiki on [XPath Functions with Strings] (https://github.com/ebeshero/DHClass-Hub/wiki/XPath:-Functions-with-Strings)
- Quick go-to guide on XPath Functions: Xpath Functions We Use Most