GitHub - ktkalaru/yamlfig: Verify the structure of YAML-formatted configuration files.

The yamlfig package provides developers with a framework for defining rules that test and verify a config file's structure. Those rules are captured in a parser object which can be applied to YAML-based config files to validate them.

In particular, this module enables a developer to:

define which fields are required, optional, or will be assigned default values if omitted;
declare types for those fields (e.g., str, int, date, dict, list, etc.);
run arbitrary functions to test the values in each field (e.g., regular expressions matches or file-path-existence checks).

After a config file is parsed, validated, and accepted, the returned object can be used to access the field values with some confidence that they exist, are of the expected type, and have whatever other properties the rules established. If a config file is rejected, an error explaining the violated rule is raised.

This package was inspired by the similar capability that argparse brought to command-line argument parsing.

Getting Started

Installation

pip install yamlfig

Website and Repository

The Python package is hosted on PyPI:

https://pypi.org/project/yamlfig

The source code, documentation, and issue tracker is hosted on GitHub:

https://github.com/ktkalaru/yamlfig

Example Usage

As an example for when a developer might use yamlfig, consider developing a server that binds to an address and port. When any of a list of authorized users connects, it displays the server's name and the contents of a file. The following YAML file could act as the config file for such a server:

$ cat > quickstart_server.yaml

name: Simple Single-File Server
server:
  port: 81
file_path: 'quickstart_shared_file.txt'
users:
- alice
- bob
- carol

The following script uses the yamlfig package to construct a parser for this example server. It instantiates a parser object and adds a set of rules that establish which fields and structures should be in a server's config file. Those rules establish what form those fields must take and what to do if they are missing. It then invokes this parser on a config file passed as a command line argument. Where an actual server script would then use those values to spin up a server, this script just demonstrates that the values can be accessed from the parsed object by printing them:

$ cat > quickstart_server.py

from __future__ import print_function
import sys
from yamlfig import YamlConfigParser, test

# Construct a parser for the server config file
confp = YamlConfigParser()
confp.add_rule('name', path_type=str)
confp.add_rule('description', optional=True)
confp.add_rule('server.addr', path_type=str, default='127.0.0.1', test=test.is_ipv4_address)
confp.add_rule('server.port', path_type=int, test=test.is_interval(1, 65535, include_upper=True))
confp.add_rule('file_path', path_type=str, test=test.is_file_path('exists', 'isfile'))
confp.add_rule('users', path_type=list)
confp.add_rule('users.*', test=test.is_regex('^[a-z][a-z0-9]*$'))

# Parse the config file
conf = confp.parse_file(sys.argv[1])

# Retrieve values from the conf object
print('conf.name = {0}'.format(repr(conf.name)))
print('conf.description = {0}'.format(repr(conf.description)))
print('conf.server.addr = {0}'.format(repr(conf.server.addr)))
print('conf.server.port = {0}'.format(repr(conf.server.port)))
print('conf.file_path = {0}'.format(repr(conf.file_path)))
for idx in conf.users:
  print('conf.users[{0}] = {1}'.format(idx, repr(conf.users[idx])))

When we execute this server script on the above config file, it prints the following values:

$ python quickstart_server.py quickstart_server.yaml

conf.name = 'Simple Single-File Server'
conf.description = None
conf.server.addr = '127.0.0.1'
conf.server.port = 81
conf.file_path = 'quickstart_shared_file.txt'
conf.users[0] = 'alice'
conf.users[1] = 'bob'
conf.users[2] = 'carol'

Note how the fields and values printed came not only from the config file but also from the rules. Fields marked as optional or taking a default are present even though they were not in the config file. Also note how the values have the type and form required by the rules (e.g., an integer within a given interval, a string in IPv4 dotted-quad notation, and a list of usernames that all match a regular expression).

Walk-through

In this example, the config file specified a name to display, a port to listen on within a server block, a file_path to a file to share, and the list of accounts of authorized users. The rules added to the confp parser ensure that those rules exist; they also define rules for some paths not in the config file and configured as optional or taking a default value:

a description field, if provided, would be displayed to the user, but it is optional;
an addr field within the server block that, if provided, would set the binding IP address, but that takes a default of 127.0.0.1 if omitted.

Additionally, the confp parser verifies that the values present in the config file are suitable for our intended use. Some of the values are type-checked or otherwise validated:

the addr field, if provided, will be tested to confirm that it has the format of an IPv4 address (i.e., a string in dotted-quad notation);
the port field will have its type checked to ensure it is an int, and its value will be tested to confirm that in the range 1 to 65535;
the file_path will be interpreted as a path to a filesystem object, and that object will be tested to confirm it exists and is a file (rather than a directory);
all the account names in the users list will be tested against a regular expression to confirm they match the format of accounts on this particular system (i.e., they start with a lowercase letters followed by zero or more digits or lowercase letters).

Once the confp parser is constructed and configured, its parse_file method is called on the config-file path given on the command line, and it returns a parsed conf object. To demonstrate that the conf object contains all the fields and values from the config file merged with the optional fields and defaults from the parser rules, it prints those fields and values.

The values in the conf object returned by parse_file have also been type-checked and tested. Had the fields and values in the config file not conformed to the rules of the confp parser, a ParseError exception would have been raised. Some examples:

If the name field were omitted:

ParseError: quickstart_server.yaml: "name" is missing
If the server block contained a field called the_ip_address that did not match any of the parser's rules:

ParseError: quickstart_server.yaml: "server.the_ip_address" unexpected by parser
If the port field of the server block contained the string "eighty-one" rather than the integer 81:

ParseError: quickstart_server.yaml: "server.port" has type str not type int
If the addr field were present in the server block and had the value 452.34.256.193:

ParseError: quickstart_server.yaml: "server.addr" failed test: 1st octet of "452.34.256.193" exceeds 255
If the file_path field had been the path to an existing directory named some_directory instead of the path to an existing file:

ParseError: quickstart_server.yaml: "file_path" failed test: "some_directory" is not a file
If the 3rd value of the users list been the display name Carol C. instead of the username carol (and noting zero-based indexing):

ParseError: quickstart_server.yaml: "users.2" failed test: "Carol C." does not match /^[a-z][a-z0-9]*$/

The presence of such errors in the config file would have stopped execution and provided a relatively informative explanation of which rule failed and why. Because none of these errors were raised, a developer has some assurance that the structure and values in the conf object meets their expectations.

Just as important as what happened in this example above is what didn't happen. When the read_file function returned the conf object, it didn't raise a ParseError exception. Since it executed successfully, we know that all the parser assertions hold about which fields must exist and what formats they take; the remaining code does not need to perform such checks and error handling itself.

Details

What yamlfig provides beyond a standard YAML parser is validation, specifically verification that a config file conforms to the various rules established for it. In this section, we introduce and describe these rules, and the various constraints that can be placed on a config file's structure and values.

Basic Usage

The typical steps when using yamlfig are:

instantiate a YamlConfigParser object, which we usually call confp,
configure it by using add_rule to add rules for each field we intend to control through a config file,
invoke parse_file on a config file which either raises a ParseError or returns a YamlConfig object, usually called conf, and
use that YamlConfig in subsequent code, confident that its structure and values have already been validated.

The following script illustrates this typical pattern by using yamlfig. For the sake of the example, let's say we need a config file to drive how often a loop is run, which of two functions is called by the loop, and what parameter is passed to that function:

$ cat basic_usage.py

import sys

from yamlfig import YamlConfigParser

# 1. Instantiate a YamlConfigParser object (confp)
confp = YamlConfigParser()

# 2. Configure the parser by adding rules for each field
confp.add_rule('loop_count')
confp.add_rule('do_special_function')
confp.add_rule('function_parameter')

# 3. Invoke the parser on a config file (provided as an argument)
conf = confp.parse_file(sys.argv[1])

# 4. Use the YamlConfig object in subsequent code
for loop_index in range(conf.loop_count):
  if conf.do_special_function:
    special_function(conf.function_parameter)
  else:
    regular_function(conf.function_parameter)

One config file would cause the script to produce one behavior:

$ cat basic_config_1.yaml

loop_count: 7
do_special_function: yes
function_parameter: "a meerkat"

Running the script on basic_config_1.yaml would cause special_function to be invoked 7 times, each with the parameter "a meerkat".

Another config file would cause the script to produce a different behavior:

$ cat basic_config_2.yaml

loop_count: 3
do_special_function: no
function_parameter: a pony

Running the script on basic_config_2.yaml would cause regular_function to be invoked 3 times, each with the parameter "a pony".

Just as important is understanding the behavior of the script on a bad config file. The following config file is missing one of the three required fields:

$ cat basic_config_bad.yaml

loop_count: 3
function_parameter: 'a unicorn'

Running the script on basic_config_bad.yaml exits unsuccessfully and prints an exception:

Traceback (most recent call last):
  File "basic_usage.py", line 22, in <module>
    conf = confp.parse_file(sys.argv[1])
  [...]
yamlfig.base.ParseError: basic_config_bad.yaml: "do_special_function" is missing

The error is raised within the confp.parse_file function. All verification and validation occurs as part of that function called in step 3 of the pattern, so if it return successfully, the YamlConfig object conforms with the parser rules.

Fields, Paths, and Structure

The fundamental thing that yamlfig rules do is establish which fields should be in a config file and which fields should not.

Test that a field exists

confp.add_rule('dirname')

The first argument to add_rule is the rule_path. Every rule added to a confp object must have one, and—unless additional modifiers make the field optional or take a default value—it is an existence requirement for the field. Given the rule above, any config file must contain a line such as:

dirname: /var/share/SomeApp/SharedDir

A config file without a dirname field would generate a parse error.

Test that a path exists

Part of YAML's descriptive power comes from its ability to encode nested structures, like maps and lists, and yamlfig rules can describe constraints on that structure:

confp.add_rule('server.storage.dirname')

Rules use the '.' character to delimit fields within a nested structure. The rule above expects there to be a server block, within which is contained a storage block, within which there is a dirname field. The following config would satisfy such a rule:

server:
  storage:
    dirname: /var/share/SomeApp/SharedDir

Such a rule path implicitly includes existence requirements for server and server.storage. The existence of those paths would not need to be explicitly required through separate rules, unless we wanted to modify them (e.g., by making them optional or take defaults as described in a later section.

Test that a block has a specific substructure

In this example, we need a config file to describe how a server's local storage cache is configured (i.e., where it is on the filesystem, how big it can grow, and what permissions the cache files have).

confp.add_rule('server.storage.dirname')
confp.add_rule('server.storage.maxsize')
confp.add_rule('server.storage.umask')

In combination, these three rules describe the structure that the server.storage block must have (i.e., three fields with the names dirname, maxsize, and umask).

The following config file would be accepted by this parser:

server:
  storage:
    dirname: /var/share/SomeApp/SharedDir
    maxsize: 10GB
    umask: 0644

Test that a field or path does not exist

Any field or path for which there is no matching rule will raise a parse error. In a sense, the yamlfig field-existence validation is deny-by-default. We do not need to do anything specific to assert that a field does not exist; just don't add an existence requirement.

Test that a block contains fields without specifying which fields by using wildcards

A rule path can contain wildcards. For the sake of this example, we need a config file to specify upload paths for each of one or more users. A server.upload_paths block will map from username to their corresponding upload directory, as in the following example:

server:
  upload_paths:
    alice:   /home/alice/uploads
    bob:     /home/bob/public

Since we do not want to hardcode the usernames in the parser, we can use a wildcard rule to accept one or more fields within a block without specifying the specific field names:

confp.add_rule('server.upload_paths.*')

Such a rule asserts that the server.upload_paths block contains non-empty substructure (i.e., it is a block), but not the specific field names within the substructure. In the above example config file, the wildcard woudl match both alice and bob, even though neither are explicitly listed field paths.

A new user could be added with their own upload path, and the same parser would accept the config file:

server:
  upload_paths:
    alice:   /home/alice/uploads
    bob:     /home/bob/public
    carol:   /home/carol/tmp

Note that a wildcard rule must match one or more fields, not zero. If the server.upload_paths block were empty, the config file would raise an error: "server.upload_paths" must contain at least one field. A later segment describes how the optional and default flags can be used with wildcards to implement a zero-or-more match.

Also note that partial wildcard matches are not currently supported. A path like server.upload_paths.user-* intending to accept fields like user-alice and user-bob would instead raise an error. As described in a different later segment, one way to implement such a check would be to write a function that tests every field within a block against a regular expression, and then specify that function as a test function for the server.upload_paths block.

Wildcard fields can have substructure and rules can enforce matching substructure

For the sake of this example, a server hosts one or more projects, each of which has a directory of static web pages associated with it, and a backend database. Our configuration file maps from one or more arbitrary project names (e.g., ProjectX and meerkat_works) to blocks that contain precisely three datapoints (1) a path to a directory of webpages, (2) a path or URL to a database, and (3) the type of the database (e.g., sqlite, mysql, or mongodb).

The following config file gives an example of this structure:

projects:
  ProjectX:
    webpath: /home/alice/projx/html
    dbpath:  /home/alice/projx/project.db
    dbtype: sqlite
  meerkat_works:
    webpath: /home/bob/public/meerkat/www
    dbpath:  mongodb://192.168.1.200:27017
    dbtype: mongodb

With yamlfig, we can specify wildcards on paths while still requiring that any fields matching the wildcard have a required, fixed-field substructure. The following parser will accept one or more blocks, each corresponding to a project name, but every one must have the three required fields:

confp.add_rule('projects.*.webpath')
confp.add_rule('projects.*.dbpath')
confp.add_rule('projects.*.dbtype')

These rules implicitly assert that the projects block exists and contains one or more sub-blocks, with no restriction on their field names. The rules explicitly assert that each one of those sub-blocks must contain exactly three fields: webpath, dbpath, and dbtype. This parser would accept the example config file above.

If a project sub-block were missing one of the three required fields or had an extra field, an error would have been raised.

Wildcards are also useful for accepting lists of values

YAML's nested structure supports not only the mappings described above but also lists. Technically, YAML offers a whole lot of different nesting types (e.g., omap, pairs, etc.) but our python parser represents them all as either dict or list objects, with mappings represented as dict objects and lists as lists. To most easily and succinctly accommodate both mappings and lists, yamlfig effectively treats lists as a very specific kind of mapping, where each field is a list index and each value is the item in the list.

Consider a config file where a users block contains a list of one or more authorized users:

users:
- alice
- bob
- carol

The following rule would accept such a users block:

confp.add_rule('users.*')

Note that the wildcard rule by itself just ensure that there are subfields, not that they take the form of a list as opposed to a mapping. That same parser would accept a config file with a mapping:

users:
  alice:  Alice A.
  bob:    Bob B.
  carol:  Carol C.

These two config files—the list version and the mapping version—have very different structures, and a program would likely be expecting one and not the other. To ensure that a rule with a wildcard matches only a list (and not a map) or only a map (and not a list), we would need to use type checking, as described in a later segment, to assert that the type of the block is either list or dict respectively.

Also note that when accessing list values parsed into a conf object, we need to be aware of some difference in their behavior from that of a standard python list, as described in the Handling Parsed Objects section.

Test that a list has exactly n elements

While not a typical occurrence, parser rules can be configured to ensure that a list has a specific number of elements. The following rules would accept a list of length 2 by explicitly requiring fields named 0 and 1:

confp.add_rule('network.route.0')
confp.add_rule('network.route.1')

For the sake of the example, perhaps the application must have two network routes, a primary and a secondary.

As noted above, yamlfig treats lists as mappings from numeric fields to values, so the parser would accept the following config:

network:
  route:
  - 192.0.2.1
  - 198.51.100.1

As a side note, the same two rules would accept a config in which the route block contained a mapping from numeric string fields (i.e., "0" and "1") to the two IPv4 addresses. As we keep stressing, the rules simply treat lists as mappings from numeric fields to the list elements. To differentiate a list from a mapping, we would need to use type checking, as described in a later segment

Handling Parsed Objects

As described in the Basic Usage section, to parse a config file, a parser's parse_file method would be called with the name of the file:

conf = confp.parse_file(conffile)

Assuming the parsing and validation succeeds, the conf object would have type YamlConfig or YamlConfigList, depending on whether the root-level YAML object in the config file is a mapping or a list. Typically, a YAML-formatted config file will have a mapping as its root-level structure, and so we will consider that common case first.

Throughout this section, assume that we have successfully parsed the following config file into a YamlConfig object named conf:

dirname: /var/share/SomeApp/SharedDir
server:
  projects:
    ProjectX:
      webpath: /home/alice/projx/html
      dbpath:  /home/alice/projx/project.db
      dbtype:  sqlite
    meerkat_works:
      webpath: /home/bob/public/meerkat/www
      dbpath:  mongodb://192.168.1.200:27017
      dbtype:  mongodb
users:
- alice
- bob
- carol
- dave

While the focus of this section is on accessing the conf object after confp successfully parses and validates the config file, for the sake of completeness, the following rules would configure a parser that accepts this file:

confp = YamlConfigParser()
confp.add_rule('dirname')
confp.add_rule('server.projects.*.webpath')
confp.add_rule('server.projects.*.dbpath')
confp.add_rule('server.projects.*.dbtype')
confp.add_rule('users.*')

Fields and paths can be accessed as attributes

Fields in a config file can be accessed as attributes of the YamlConfig object.

conf.dirname                # '/var/share/SomeApp/SharedDir'

If an attribute corresponds to a block in a config file, it will return that block as a YamlConfig or YamlConfigList object.

conf.server                 # <YamlConfig object at 0x[...]>
conf.users                  # <YamlConfigList object at 0x[...]>

As such, attributes can be strung together in a sequence:

conf.server.projects.ProjectX.webpath      # '/home/alice/projx/html'
conf.server.projects.ProjectX.dbpath       # '/home/alice/projx/project.db'
conf.server.projects.ProjectX.dbtype       # 'sqlite'

conf.server.projects.meerkat_works.dbtype  # 'mongodb'

Note that to be accessed as an attribute, a field must be a valid Python attribute (e.g., must be a string, cannot start with a number, etc.).

Fields and paths can be accessed via index lookups

A values stored in a YamlConfig object can also be accessed via index lookup.

conf.server.projects['ProjectX'].dbtype               # 'sqlite'

proj = 'ProjectX'
conf.server.projects[proj].dbtype                     # 'sqlite'

conf['server']['projects']['ProjectX']['dbtype']      # 'sqlite'

path = ['server', 'projects', 'ProjectX', 'dbtype']
functools.reduce(lambda d, idx: d[idx], path, conf)   # 'sqlite'

List values can be accessed via index lookups

Index lookups must be used to access the elements of a YamlConfigList since attributes cannot be numbers.

conf.users[0]             # 'alice'
conf.users[1]             # 'bob'
conf.users[2]             # 'carol'
conf.users[3]             # 'carol'

conf.users[-1]            # 'dave'
conf.users[-2]            # 'carol'

In a departure from standard python lists, a YamlConfigList object will translate to or from a string representation of an index as needed.

conf.users["1"]           # 'bob'
conf.users['-2']          # 'carol'

Once again, this is to allow—as much as possible—lists to be treated like mappings from the list indexes to the list elements.

Length checks can be used to determine the number of fields

As with dict and list objects, we can see how many elements are in a YamlConfig and YamlConfigList object by querying their length.

len(conf)                                  # 3
len(conf.server)                           # 1
len(conf.server.projects)                  # 2
len(conf.server.projects.ProjectX)         # 3
len(conf.server.projects.meerkat_works)    # 3
len(conf.users)                            # 4

Iterators return field names for `YamlConfig` objects

Iterating on a YamlConfig object will return the field names contained within the block, like what we would get from iterating on a dict object:

list(conf)                                 # ['dirname', 'server', 'users']
list(conf.server)                          # ['projects']
list(conf.server.projects)                 # ['ProjectX', 'meerkat_works']
list(conf.server.projects.ProjectX)        # ['webpath', 'dbpath', 'dbtype']
list(conf.server.projects.meerkat_works)   # ['webpath', 'dbpath', 'dbtype']

Note that the order in which YamlConfig fields are returned is the order the rules were added to the parser, not the order in which the rules appear in the config file. When a single parser rule matches multiple fields (i.e., a wildcard rule), the fields are returned in arbitrary order. Note however, that around Python 3.6 and Python 3.7, they have started being returned in the order they appear in the config file, likely due to dict objects beginning to return keys in the order they were inserted.

Iterators return list indexes not values for `YamlConfigList` objects

Iterating on a YamlConfigList object is significantly different from iterating on a python list. In particular, it will return the list of index values as strings, not the actual list values:

list(conf.users)                            # ['0','1','2','3']

As noted previously, a YamlConfigList treats lists less like lists per-se and more like mappings from zero-based, sequential, numeric indexes to values. As such, its iterator returns field names that can be used as indexes to look up values, not the values themselves.

This behavior is likely unexpected at first and arguably controversial, but was chosen for greater overall simplicity. A lot of code can iterate over fields, descend into blocks, and so on much more simply, when it does not need to treat YamlConfigList objects as a special case, separate from YamlConfig objects.

To get the values rather than the indexes, we recommend list comprehension:

[conf.users[idx] for idx in conf.users]     # ['alice', 'bob', 'carol', 'dave']

Unlike the fields of a YamlConfig, indexes of a YamlConfigList will be returned in a specific order: sequential and increasing from a base of zero.

Optional, Default, and No-Follow Rules

Having examined how to configure a parser to require certain fields and structure, and how values will be represented in the parsed object, we introduce ways to make rules optional, take default values, and have the parser ignore their substructure.

A field flagged as optional can be omitted

When instantiating and adding a new rule, we can specify optional=True:

confp.add_rule('name')
confp.add_rule('description', optional=True)

The above parser would require a name field but not a description field, as in the following config file:

name: Simple Single-File Server

The parser will accept the file, create a description field, and assign it the value None.

conf.name                           # 'Simple Single-File Server'
conf.description                    # None

A program acting on the conf object can assume that the optional field exists, but it will have the value None if it was not present in the config file (or if it was explicitly assigned the value None since the two are treated as equivalent).

Optional fields can have required substructure

A rule representing a nested block can be marked optional and still have substructure with required fields. For the sake of example, a server requires three files in order to encrypt its communications using SSL. If a server.ssl block is present in the config file, those files must be provided, and the server will use SSL. If the block is omitted, the config file should still be accepted, but the server will fall back to unencrypted communications.

The following parser is configured with an optional server.ssl block that, if it exists, must have three specific fields:

confp.add_rule('server.addr')
confp.add_rule('server.port')
confp.add_rule('server.ssl', optional=True)
confp.add_rule('server.ssl.key')
confp.add_rule('server.ssl.cert')
confp.add_rule('server.ssl.chain')

In the following config, the optional ssl block and its substructure have been omitted:

server:
  addr: 127.0.0.1
  port: 81

Since the block was not included, the ssl field is present in the conf object but assigned a value of None.

conf.server.ssl           # None

In the following config, the optional ssl block and its substructure have been included:

server:
  addr: 127.0.0.1
  port: 81
  ssl:
    key: /etc/ssl/privkey.pem
    cert: /etc/ssl/cert.pem
    chain: /etc/ssl/full_chain.pem

Since the block was included, its substructure was parsed and validated. The conf object includes the block and its substructure.

conf.server.ssl               # <YamlConfig object at 0x[...]>
conf.server.ssl.key           # '/etc/ssl/privkey.pem'
conf.server.ssl.cert          # '/etc/ssl/cert.pem'
conf.server.ssl.chain         # '/etc/ssl/full_chain.pem'

The existence requirements on the substructure will only be checked and enforced if the optional field is present. In the following config, the optional ssl block is present, but it is missing one of its required fields:

server:
  addr: 127.0.0.1
  port: 81
  ssl:
    key: /etc/ssl/privkey.pem
    # cert: /etc/ssl/cert.pem
    chain: /etc/ssl/full_chain.pem

When parse_file is invoked on this config file, a ParseError is raised: "server.ssl.cert" is missing.

A default field will take a default value if omitted

When instantiating and adding a new rule, we can specify a default.

confp.add_rule('server.addr', default='127.0.0.1')
confp.add_rule('server.port')

In the following config, the default rule has been omitted:

server:
  port: 81

The parser will accept the file, create not only a port field but also an addr field within the server block, and since the addr field does not appear in the config, it will assign the default value (127.0.0.1) to the field.

conf.server.port            # 81
conf.server.addr            # '127.0.0.1'

Default substructure must still undergo validation

The following config rule will provide an entire server block if none is specified in the config file:

confp.add_rule('server', default={'addr': '127.0.0.1', port: 81})

Providing such structure is possible, but the above rule would generate a ParseError unless it was accompanied by rules to accept the server.addr and server.port paths. With only the rule above, a config file that triggered the default would raise a ParseError: "server.addr" unexpected by parser.

We need to add rules to prepare the parser for the substructure, as in the following parser that accompanies the default rule with two more:

confp.add_rule('server', default={'addr': '127.0.0.1', port: 81})
confp.add_rule('server.addr')
confp.add_rule('server.port')

With these two additional rules, a config file will be accepted with the default values if server is omitted, and it will require those two values be present if a server block is present. In both cases, once parsing is successful, the program can assume that conf.server.addr and conf.server.port exist.

If we really did not want to validate the fields of the default substructure, rather than adding rules for the fields, we could mark the block as no-follow as described in a later segment.

Fields cannot both be optional and take a default

The optional and default parameters to add_rule are mutually exclusive; if both are specified, an error will be raised. Essentially, optional=True acts like a default rule for which the default value is None. In fact, setting optional=True is the only way for a missing field to be assigned a value of None, since setting default=None is a no-op. A value of None for default actually signals that no default has been specified, so the field is still required.

It is unclear what the semantics would even be for an optional rule that also takes a default, so the pairing is just not allowed.

A default path can have optional subpaths and vice versa

In the following parser configuration, the server block takes a default, while the server.ssl block is optional:

confp.add_rule('server', default={'addr': '127.0.0.1', 'port': 81})
confp.add_rule('server.addr')
confp.add_rule('server.port')
confp.add_rule('server.ssl', optional=True)
confp.add_rule('server.ssl.key')
confp.add_rule('server.ssl.cert')
confp.add_rule('server.ssl.chain')

We can see what will happen in the following config, where the server field is omitted. Note that this config file uses the convention that leaving a field value blank assigns it a value of None (or null in YAML terms), and that causes it to be treated as omitted by yamlfig:

server:

The parser above will accept this config file. Since the server block has been omitted, it will substitute its default value. Since the server.addr and server.port fields are provided by the default, they will pass the rules requiring their existence. The server.ssl field has not been provided by the default, but since it is flagged as optional, the field will be created and assigned a value of None:

conf.server                  # <YamlConfig object at 0x[...]>
conf.server.addr             # '127.0.0.1'
conf.server.port             # 81
conf.server.ssl              # None

The reverse is also true. Default fields can be included in the substructure of an optional field, and they will take the default values if they are omitted from the config but the optional block is included. Other combinations work as well (e.g., default fields within default blocks; optional fields within optional blocks; optional fields within default blocks within optional blocks; etc.).

If it helps, we can think of optional and default flags being handled from the top down in a cascade. If a parent field is omitted, it will be checked for optional or default flags first. If it is optional, the field will be created with a None value and the parsing will move on. If it takes a default, the field will be created with the default value or substructure, and the parser will descend into that substructure, checking those fields and values before moving on. The parser will only encounter child fields and values after the parent's optional or default nature has been handled.

Default block and optional wildcarded path recognize zero-or-more

As noted earlier, a wildcard rule path requires that a config file have one or more fields matching the path. By default, a wildcard rule will raise an error if there are no fields matching it, but there are times when we want to accept zero-or-more matches.

The following rules configure a parser for cases where we want a block with zero or more subfields:

confp.add_rule('server.upload_paths', default={})
confp.add_rule('server.upload_paths.*', optional=True)

The optional flag on the wildcard path will cause the parser to allow the upload_paths block to contain no fields. The default on the upload_paths field will create that empty block if the field is null. The following config file would be accepted by this parser:

server:
  upload_paths:

By leaving upload_paths null, we signal that it has been explicitly omitted, so it is given its default value (i.e., an empty mapping). Then, since the wildcard path for the fields within upload_paths is flagged as optional, that rule will be satisfied with zero fields. The YamlConfig object would exist but contain zero fields:

conf.server.upload_paths         # <YamlConfig object at 0x[...]>
len(conf.server.upload_paths)    # 0

For completeness sake, the same parser would also accept a config file with one or more fields within upload_paths:

server:
  upload_paths:
    alice:   /home/alice/uploads
    bob:     /home/bob/public

In this case, the YamlConfig object would contain two fields:

conf.server.upload_paths         # <YamlConfig object at 0x[...]>
len(conf.server.upload_paths)    # 2
set(conf.server.upload_paths)    # {'alice','bob'}

This pattern—with the block taking an empty substructure as the default and the wildcard rule flagged as optional—is the recommended way to implement a parser that accepts zero-or-more fields or list elements.

A path marked no-follow can have any and arbitrary substructure

In some cases, we want to stop a yamlfig parser from attempting to validate a substructure, either because the program is designed to handle whatever is beneath that value or, more often the case, the actual structure follows a complicated syntax, but the program will be passing that structure to another package, and it has its own functions for validating the input.

For example, consider an example where a program needs to pull back a list of projects from a MongoDB database. The following config file provides values that might be needed to (1) access the database, (2) reference the specific collection within the database, and (3) filter the results to only a subset of all projects, using a MongoDB query:

mongodburl: mongodb://192.168.1.200:27017/
collection: projects
filterquery: { 'is_private': { '$ne': true } }

All three fields must exist, but the filterquery field contains a MongoDB query as its substructure. MongoDB queries can be expressed as JSON objects, and YAML syntax is a superset of JSON, so the query can be expressed as JSON/YAML right within the YAML config file.

By default, the yamlfig parser will try to validate that object (i.e., check whether the paths filterquery.is_private and filterquery.is_private['$ne'] are expected by the parser). Configuring a confp parser to correctly validate the syntax of an arbitrary MongoDB query is impossible and an unnecessary waste of complexity. As soon as the script hands the query off to MongoDB, it is going to do a much better job of validating it.

The following parser is configured to accept the above config file:

confp.add_rule('mongodburl')
confp.add_rule('collection')
confp.add_rule('filterquery', nofollow=True)

This parser will require that a filterquery field exists along with the mongodburl and collection fields, but the nofollow argument ensures that it will not descend into the substructure within the filterquery field. No additional validation of that substructure will take place.

The value at conf.filterquery is a standard python dict which can be passed to a MongoDB find command as-is.

conf.filterquery        # {'is_private': {'$ne': True}}

As an aside, note that until this example, we have been using YAML block-structure syntax rather than JSON syntax, but there is no difference between the two formats once parsed. The config file above could have been written equivalently as follows:

mongodburl: mongodb://192.168.1.200:27017
collection: projects
filterquery:
  is_private:
    "$ne": true

A distinct alternative would have been to encode the MongoDB query object as a string, as in the following line:

filterquery: "{ is_private: { $ne: { true }}}"`

While a viable alternative, there are benefits to not doing so. By storing the query object as a query object, we actually do perform some syntax checking at parse time, before handing it off to MongoDB. We ensure that the brackets are balanced and the JSON is legal. We also get whatever syntax highlighting our editor provides to YAML/JSON. A string would simply be treated as a string by the yamlfig parser, and we would eventually have to invoke a JSON parser ourselves.

A path marked no-follow can also be optional or take defaults (but not both)

The nofollow parameter really affects the handling of the value not the field, whereas optional and default are parameters that affect the handling of the field (i.e., what to do if it is omitted).

If a field is omitted from a config, and if its path is marked in the parser as both optional and no-follow, the field will be created and assigned the value None. If it were not optional, an error would be raised. Since the value None is terminal and has no substructure, being marked no-follow has little effect.

If a field is omitted from a config, and if its path is marked in the parser as no-follow and taking a default, the field will be created and the default value will be substituted. If the default value has substructure (i.e., it is a dict or a list), then the no-follow marking would apply and no additional validation would be performed by the parser on that substructure.

A path marked no-follow cannot have any subrules

A parser configuration such as the following would raise an error:

confp.add_rule('filterquery', nofollow=True)
confp.add_rule('filterquery.is_private')

The no-follow condition on a path means that no rules on descendant paths will ever be checked or validated, so we prevent such rules from being added. For this example, a ValueError would be raised explaining that "filterquery.is_private" is a descendant of a no-follow rule.

Path Type Checking

Within the yamlfig parser, after establishing that every required field exists, that every optional or default field has been handled, and that there are no unexpected fields, the parser's next step is to check that any type assertions on the values for each field are satisfied.

Ensure that a field is a str (or int or bool or float, etc.)

When instantiating and adding a new rule, we can specify a path_type.

confp.add_rule('server.addr', path_type=str)
confp.add_rule('server.port', path_type=int)

In addition to requiring that the server block contains an addr field and a port field, these rules will further check that the values are instances of the given path_type types.

The following config has a null in the addr field:

server:
  addr: ~
  port: 81

The above parser would raise an error: "server.addr" has type NoneType not type str.

Likewise, the following config has a string in the port field:

server:
  addr: 127.0.0.1
  port: "81"

The above parser would raise an error: "server.port" has type str not type int.

The type that a value takes is determined by the underlying raw-YAML parser that yamlfig uses. By defalt, we use SafeLoader within PyYAML. It recognizes the following types:

bool
str
unicode (in Python 2, when the value contains non-ASCII characters)
int
long (in Python 2, when the value is larger than sys.maxint)
float
date (in the datetime package)
datetime (in the datetime package)
dict (for mappings and mapping-like tags)
list (for lists and list-like tags)
NoneType (i.e., path_type=type(None))

Any of those types could arise in a config and be accepted or rejected by a path_type argument. Additionally, it is possible to replace SafeLoader with a different YAML parsing class, in which case the set of types would depend on what types it constructed.

Union types handle complex types like a number or a string

In the following config, the timeout field will be parsed as a float:

server:
  timeout: 1.2

But in the following config, the timeout value will be parsed as an int:

server:
  timeout: 1

Assuming the underlying server wants a float but handles the conversion of an int all by itself, we don't really want to force a user to add spurious decimal points (e.g., changing 1 to 1. will ensure the parser returns a float) just to consistently achieve a single type across all configs.

The following parser configuration rule will accept a timeout that is either an int or a float:

confp.add_rule('server.timeout', path_type=(int, float))

By specifying a tuple of types, we can direct yamlfig to accept values that are instances of either type.

This feature was a lot more urgent in Python 2, where we almost always wanted a string to be checked against (str, unicode), so that the appearance of a word with an accent or umlaut in a descriptive string wouldn't suddenly cause our config file to be rejected. Things have gotten calmer with Python 3 (e.g., str vs unicode and int vs long are no longer issues), but union types do still arise (e.g., int vs float).

Ensure that a path contains a map or a list

Consider this parser configured to accept mappings from project names to descriptions:

confp.add_rule('projects', path_type=dict)
confp.add_rule('projects.*')

Contrast it with this parser configured to accept lists of authorized user names:

confp.add_rule('users', path_type=list)
confp.add_rule('users.*')

In both cases, the wildcard rule accepts one-or-more arbitrary fields within the block, but as noted in an earlier segment, the wildcard does not distinguish a map from a list. The path_type=dict constraint is what ensures that the first parser accepts config files with a map, like this:

projects:
  ProjectX: "Project X is an eXtreme project (for more info talk to Alice)"
  meerkat_works: "Bob's not-quite skunkworks project"

The path_type=list constraint is what ensures that the second parser accepts config files with a list, like this:

users:
- alice
- bob
- carol

Since few programs are written to expect either a mapping or a list, we typically want to use type checking to ensure that a config-file block contains the expected structure.

A config file itself can be verified as either a list or a map

While we have so far considered YAML files with a map structure at the root level, a YAML file could also be a list:

- addr: 192.0.2.200
  port: 81
- addr: 192.0.2.201
  port: 81
- addr: 198.51.100.15
  port: 8080
- addr: 203.0.113.130
  port: 8080

In this example, these address-port pairs might be a list of mirrors, ordered by proximity. The following parser is configured to check that the YAML file itself is a list, and then that each element of the list has the proper substructure:

confp = YamlConfigParser(path_type=list)
confp.add_rule('*.addr', path_type=str)
confp.add_rule('*.port', path_type=int)

After reading and validating the above config file, this parser returns a YamlConfigList object:

len(conf)                  # 4
conf[0].addr               # '192.0.2.200'
conf[0].port               # 81

Note that in this example, we actually included the instantiation of the YamlConfigParser as confp. All our previous examples (after Basic Usage) assumed that step. But when we are making assertions about the top-level object parsed from the config file, those are configured as part of the YamlConfigParser instantiation.

Down deep, every rule that gets added to the parser is of type YamlConfigRule. The YamlConfigParser class inherits from YamlConfigRule, and any arguments are used to validate the root-level object rather than any particular field or path within the object. The primary difference between the root YamlConfigParser object and the YamlConfigRule objects that are added to it is that the YamlConfigParser cannot have a rule_path while the other objects must. Additionally, the root-level object cannot be optional or take a default. (It can be flagged no-follow, though.)

A config file cannot be an atomic value; it must be a map or list

A yamlfig parser will not accept a config file without any substructure. Technically, this is a valid YAML file:

A standard YAML parser will parsed it as an int. However, yamlfig will raise an error: config is a(n) int but a record or list is expected.

Honestly, if our program must accept config files consisting of a single value, yamlfig might not be the right tool for the situation. If we still desired to make a go of it, we could nest that value in a single-field mapping, like so:

number: 42

The following parser would accept that config file, with a single field, the value of which is a list:

confp = YamlConfigParser(path_type=dict)
confp.add_rule('number', path_type=int)

Even more concisely, we could nest the value in a singleton list:

- 42

The following parser would accept such a config file:

confp = YamlConfigParser(path_type=list)
confp.add_rule('0', path_type=int)

Note the space between - and 42. Without it, the singleton list collapses back to a single (negative) integer:

-42

Such a single value would not be accepted.

Rule Test Functions

While type checking helps validate the values in a config file, we often want to place additional constraints on those values. For instance, we might want a value not only to be an int but to fall within a particular range. We might want another value not only to be a str, but also to match a regular expression. We might want a third value not only to be a str but also to point to an existing file.

When adding a rule to a parser, we can specify a test function using the test argument, to perform additional checking of values. The yamlfig.test package contains a variety of pre-packaged test functions for some common validation scenarios.

Verify that a value matches a regular expression

This parser rule will constrain username to start with a lower case letter and be followed by zero or more lowercase letters or numbers:

confp.add_rule('username', test=test.is_regex('^[a-z][a-z0-9]*$'))

This config file would be accepted by such a parser:

username: 'carol57'

This config file would be rejected:

username: 'Carol C.'

The parse error would include the explanation "username" failed test: "Carol C." does not match /^[a-z][a-z0-9*$/.

Verify that a value is an IPv4 address

This parser rule will constrain the addr field of the server block to be a valid dotted-quad IPv4 address:

confp.add_rule('server.addr', test=test.is_ipv4_address)

Test functions packaged within yamlfig

The yamlfig.test module that contains various common validation tests has been directly imported as the test object in the examples of this documentation, but would otherwise be accessed as yamlfig.test (e.g., test=yamlfig.test.is_regex('^[a-z][a-z0-9]*$')).

The following test functions are available in the yamlfig.test module:

is_interval(lower, upper) verifies that the value is within the range defined by the lower and upper bounds;
is_regex(regex, invert=False) verifies that the value matches the regular expression (or does not match it, if inverted);
is_ipv4_address verifies that the value is an IPv4 address in dotted-quad notation;
is_domain_name verifies that the value conforms to the specification of a DNS domain name (which, note, is a looser constraint than that it be an actual operating and reachable domain name);
is_email_address verifies that the value (roughly) conforms to the specification of an email address;
is_url verifies that the value (roughly) conforms to the specification for URLs;
is_file_path(*ostests) takes one or more strings corresponding to properties of filesystem objects, interprets the value as a filesystem path, and verifies that the path satisfies all of the listed properties. Properties include:
- 'exists' and '!exists': the path exists (or is not);
- 'isdir' and '!isdir': the path is a directory (or is not);
- 'isfile' and '!isfile': the path is a file (or is not);
- 'islink' and '!islink': the path is a symlink (or is not);
- 'ismount' and '!ismount': the path is a mount point (or is not).

Note that for higher-order functions (i.e., the ones that return the test functions suitable for test), not all optional parameters are shown and described. Check the help documentation for each function for additional detail on usage and options.

Writing our own test functions

The test parameter to add_rule takes a function with three parameters: conf, path, and value. It signals acceptance of the value by returning None, and rejection by returning a string explaining what caused the failure.

Some of the test functions packaged within yamlfig, like test.is_ipv4_address, directly match that specification. Others, like test.is_regex, return a function that matches the specification based on their arguments.

In an earlier segment, we explained that rule paths could not contain partial wildcards (e.g., user-* to require that all fields start with a particular prefix). However, test functions do offer a way to recognize such properties and more. Consider a config file in which a block must contain a default field, and can optionally contain zero or more fields that must all match a partial wildcard like user-*.

Test functions offer a way to implement any test that we can write as a Python function:

def has_default_and_user_fields(conf, path, value):
  if not 'default' in value:
    return '"default" field is missing'
  for field in value:
    if field != 'default' and not field.startswith('user-'):
      return '"{0}" is neither "default" nor starts with "user-"'.format(field)
  return None

confp = YamlConfigParser()
confp.add_rule('uploads', test=has_default_and_user_fields)
confp.add_rule('uploads.*', path_type=str, optional=True)

The following config file would be accepted by this parser:

uploads:
  default: /var/share/SomeApp/uploads
  user-alice: /home/alice/uploads
  user-bob: /home/bob/public

The following config file would be rejected by this parser:

uploads:
  user-alice: /home/alice/uploads
  user-bob: /home/bob/public

The parse error would include the explanation "uploads" test failed: "default" field is missing.

Of the three parameters (i.e., conf, path, and value), all of the packaged test functions depend only on the value, and that will typically be the case. By providing the entire conf object as well as the path to the value being verified, yamlfig enables the test to evaluate the value in the context of the rest of the config file if necessary.

Warnings and Caveats

Field names with leading underscores – While accessing YamlConfig fields as attributes is convenient, the drawback is that any field names that start with a leading underscore risk colliding with the methods and attributes that implement the class. Consequently, if the parser encounters any field that start with '_', a warning will be raised. The warning can be suppressed by setting yamlfig.print_underscore_warning.off = True. As with non-string field names, we can always look up a field with leading underscores via index lookup (e.g., conf["_field"] rather than conf._field), but the warning is intended to make us aware of the possibility for collision.
References to transform functions in code and documentation – The API for the YamlConfigRule class and the YamlBaseConfig classes expose references to a transform function or the ability to invoke do_transform on the container object. The ability to specify a transform is planned (and described in more detail below). As the code indicates, work on this feature was already underway when this version was released, but consider it untested, incompletely documented, and subject to change.

Next Steps

Future Work

Several features are already on our list of things we would like to or have started to implement:

Allow standard fielded rules to co-exist alongside wildcard rules, with the standard rule taking precedence if it matches and the wildcard being used as a catch-all. We probably still want to disallow partial wildcards since (a) they can already be handled with test functions, and (b) they would raise the possibility of allowing multiple partial wildcard rules attached to the same parent path, and that would raise all sorts of ambiguity about what to do if multiple rules match the same field.
Allow a user to more easily specify a constraint on field names, for instance a field_type to do the same type checking on a field that path_type does on the value, and/or a field_regex to specify a pattern that the field must match. Currently these are possible, but would require the user to implement their own rule test, as described above.
Add options to test.is_file_path that (1) allow the user to specify a directory from which all relative paths are resolved, and (2) allow the user to specify a path into conf where such a base directory would be stored. These would enable support for a config file where one homepath field specifies where the program will chdir to, and then all the other paths (e.g., dbfile or htmldir are specified relative to homepath).
Extend the rule_path specification to allow us to express field types that are not strings or are strings that include 'special characters' like whitespace, the delimiter (.), or the wildcard (*). Right now, we're thinking of using square brackets in a rule path, so that the string resembles the path that would be used to access the value once parsed (e.g., rule_path="dbhosts['192.0.2.1'].port" would indicate the config file had a structure where a dbhosts block contained a field field named 192.0.2.1 which maps to a sub-block that has a port field).
Implement a test.is_in_choiceset which verifies that the value is one of the configured options or choices. So, if a rule for a dbtype path had test=test.is_in_choiceset(['sqlite', 'mysql', 'mongodb']), it would verify that the dbtype value took one of those values, returning an explanatory error message if not.
Extend YamlConfigParser with a function to write a config-file template (or and actual config file if provided with a conf object) to use for the values. For each rule, it is already possible to specify a desc describing the purpose of the path, and an example value. These values could be incorporated into the config-file template, making it somewhat self documented, and making it easier for a program to provide its users with a template. In truth, we would probably want to implement this as a Representer class that inherits from and extends SafeDumper, but with that class invoked by something like conf.write_file(filename, conf=None).
Extend YamlConfigRule with support for transformations. A goal for yamlfig was to gather into one package all of the things that we find ourselves doing over and over again when we read in a config file (e.g., checking whether fields exist, that they have the right types, and that they meet various other conditions). Another thing we do at this stage is converting them to the objects that we really want to use in our program. For instance, we don't really want the path to the log file, we want the open filehandle to it; we don't really want the IPv4 address in dotted-quad notation, we want the IPv4Address object that we can construct with it. Some initial groundwork for such transformations already exists within the code, but it needs to be built out and tested (lots and lots of testing, especially the interaction with the write-out-configs extension described above).

This list is neither exhaustive nor a promise of what is certain to come. Other suggestions are also welcome, of course, too.

Support and Collaboration

We welcome reports of issues and other contributions through our package's page on GitHub:

https://github.com/ktkalaru/yamlfig

Note that this is our first open-source project, and it was shared in part so that we could get more experience with the standard tools and workflows. We aim to respond to any issues, requests, or other feedback promptly and professionally, but some understanding may be required since we are learning as we go.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
src/yamlfig		src/yamlfig
tests		tests
.gitignore		.gitignore
.pylintrc		.pylintrc
CHANGELOG.md		CHANGELOG.md
LICENSE.txt		LICENSE.txt
README.md		README.md
_build.py		_build.py
requirements.txt		requirements.txt
setup.py		setup.py
tox.ini		tox.ini

License

ktkalaru/yamlfig

Folders and files

Latest commit

History

Repository files navigation

Contents