Skip to content

10‐user‐defined‐functions

djbpitt edited this page Oct 11, 2023 · 11 revisions

Goals

This section expands on user-defined functions introduced in stage 9. You can declare user-defined functions in the file that uses them, but especially if you use the same functions in more than one place, it’s more common to declare your functions in a separate library module that you can then import into a main module that, for example, creates the model or view for part of your app. In this section you will refactor the hoax:round-geo() function from the previous lesson to account for its arity, that is, the number of arguments the function requires. We recommend reading XQuery for humanists section 6.2. “Writing your own functions” (p. 114–19) alongside this tutorial.

What is a user-defined function?

A user-defined function is an XPath and/or XQuery expression (or combination of expressions) that you have given a name. Like most standard library functions, it typically takes input and provides output. As part of the process of giving your function a name, you also define what the possible input parameters (number, name, datatype) are and what the possible outputs are, and the combination of function name, parameters, and output type are called the function signature. Below is a function copied from the previous stage, followed by a discussion of the parts that make up the function declaration.

declare function hoax:round-geo($input as xs:string) as xs:string {
    format-number(number($input), '0.00000')
};

Here are some details about the parts of a user-defined function:

  1. The keyword expression declare function is self-explanatory: these are reserved words in XQuery that tell the processor you’re about to declare a function.
  2. The next part is the name you choose to give to your function. The function name has two parts: hoax: is your local namespace for the project (user-defined functions must be in a namespace) and round-geo describes to the reader what the function will do. As for the part of the function name after the namespace, called the local name, the computer doesn’t know or care what the words mean, and you can think of the name as an opportunity to make your function more self-documenting for a human. The combination of namespace prefix and local name is called the fully qualified name, or fqname.
  3. The function name must be followed by parentheses and inside the parentheses you specify the type of input your function expects. Each piece of input, called a parameter, must be given a name (prefixed with a dollar sign, like a variable) and a datatype specification (using the as keyword). The datatype specification is technically optional, but we always use it because it helps protect us from some types of error. The function above has one input parameter, and we’re calling that value $input and saying it will be exactly one string. Like the name of the function, the name you use for the parameter also doesn’t matter to the computer, so you should choose a name that means something to you. It’s common to call a single parameter $input, but because each parameter must have a unique name, if your function required more than one parameter you would need to come up with names to distinguish them.
  4. The function name, with parameters inside parentheses, is followed by a specification of the expected output, using the keyword as. The output specification is optional, but, as with the parameter specifications, we always use it because it helps protect us from writing functions that don’t do what we think they’re doing. In this case, the function will evaluate to a single string.
  5. The function signature is followed by a pair of curly braces, which contain the function body, that is, the XQuery or XPath that processes the parameters and creates the output. In our case the body consists of nested XPath functions. From the inside out, we use the standard library function number() to convert the $input string to a number. That number becomes the first input argument to the standard library function format-number(), which takes two arguments, a number and a pattern, called a picture string. The output of number() is the first input for format-number(), and we define the second argument as a picture string that requires exactly 5 digits to the right of the decimal point. (Because we don’t use format-number() very often we had to look up the pattern we needed in the official specification, which is to say that you shouldn’t be reluctant to Look Stuff Up!) Because format-number() returns a string and our expected output is a string, we don’t need to include any other code in our function.
  6. Finally, all declare statements in XQuery must be followed by a semicolon, which follows the closing curly brace.

If you’ve created functions in other programming languages you may notice that there is no return statement. The value of the function, which is returned automatically, is the result of evaluating the input parameter according to the expressions in the function body.

Limitations to our initial user-defined function

The function above works well if you’ve decided that whenever you render latitude and longitude in your edition the value should always have exactly 5 digits to the right of the decimal point. What, though, if you want to use different numbers of decimal places for different purposes in some future iteration of the project? For example, you might prefer 5 digits for a table because it’s more precise but two digits in running prose because it’s more legible. Below you’ll extend your user-defined function to allow different numbers of decimal places.

Enhancing your function

We write above that functions are named expressions that have defined inputs and outputs. This is true, but the name is not the only thing mapped to a function that defines it; it is also defined by its arity, or the number of pieces of input it expects. These inputs are called arguments or parameters (you can use the terms interchangeably), and because the function above declares one parameter ($input) it has an arity of one (it can even be written like this hoax:round-geo#1(), where the #1 means that this is a function with an arity of one). The combination of function name plus arity must be unique, which means that you cannot have two functions with the same name and the same arity. You can, though, declare two functions with the same name as long as they have different arities.

It isn’t possible in XQuery to declare a single function that accepts either one argument (in this case a value to format, assuming a precision of 5 decimal places) or two arguments (both a value to format and a precision). But it possible to create two functions with the same name and different arities, which can produce the same effect. Here’s what that definition and documentation look like in our code for this stage:

(:~ 
 : hoax:round-geo() formats and rounds geographic coordinates.
 :
 : Arity-1 version supplies default precision of 5 and calls arity-2 version
 : Arity-2 version requires user-supplied precision as second argument
 :
 : Default precision of 5 is street-level accurate while also being brief enough to display
 :
 : @param $input : xs:string any lat or long value
 : @param $precision: xs:string any non-negative integer
 : @return xs:string
 :)
declare function hoax:round-geo($input as xs:string) as xs:string {
    hoax:round-geo($input, 5)
};

declare function hoax:round-geo($input as xs:string, $precision as xs:integer) as xs:string {
    format-number(
        number($input), 
        '0.' || string-join((1 to $precision) ! '0')
    )
};

Instead of including the full expression for the default of 5 decimal places, as we did in the first attempt, now the arity-1 function passes the value to be formatted along to the arity-2 function, specifying a precision of 5 as its second argument. The new arity-2 function takes two arguments, a string called $input and an integer called $precision. It evaluates the exact same set of nested XPath functions, except that this time it uses the number provided by $precision to create the picture string instead of hard-coding the default. We create a picture string for the correct number of decimal places with the XPath expression:

'0.' || string-join((1 to $precision) ! '0')

The double bar (||) is a concatenation operator, so it combines the two-character string "0." with whatever follows it. If we read the following part from the inside out, it says to create a sequence of integers from 1 to the supplied value of $precision, so that, for example, if the value is 5 the output of the to expression will be the integer sequence (1, 2, 3, 4, 5). We then use the simple map operator to say “for each item to the left (there are 5 items) do the thing on the right (create a one-character string consisting of a zero character)”. The arity-1 string-join() function says “glue all of those instances of the zero character together”. As a result, if $precision is equal to 5, the expression will evaluate to "0.00000".

We could have left the arity-1 function in its original form, but by replacing it with a call to the arity-2 version we remove some duplicate code. This is good practice because it reduces the opportunity for inconsistency, since the actual formatted string is always created in the same place (the arity-2 function) no matter which function you call directly.

If you are familiar with user-defined functions in other programming languages you’ll notice that XQuery doesn’t allow functions with automatic default values for parameters that are not specified when the function is called, and that’s because an XQuery function must specify a consistent arity. But because functions with different arities can share a name, the strategy above effectively mimics the behavior of languages with optional parameters and default values. The preceding code, then, makes it possible to format a number (presumed to be a latitude or longitude) either by providing only the geo string as input and using the default precision of 5 or in a more customized way by providing the precision as a second argument.

The code for this section is available at 10-user-defined-functions in the functions.xqm file. Recall that we established the functions.xqm module separately from the rest of our modules. functions.xqm is a library module, which means that it isn’t run by itself like the XQuery that creates models or views for pages in the edition. Library modules instead are intended to be imported into regular XQuery files (the technical term is main modules), which can use the functions defined there.

Testing

You can troubleshoot your functions by creating a mock input and output in a separate XQuery file in a temporary directory, like a test that helps you during development. We find it helpful to develop functions alongside tests, small XQuery files intended to exercise the functions and compare the actual output to what we expect. Our next stage explains how to write and run tests that can help verify that your user-defined functions behave the way you expect them to behave.

Review

This section explored user-defined functions in greater depth. We introduce the idea that a function has arity, or a number of parameters it required, and we use arity to create a more extensible and robust implementation of hoax:round-geo() that allows us either to specify the decimal precision or use a default value of 5 that is built into the function definition.