The yamlfig package provides developers with a framework for defining rules that test and verify a config file's structure. Those rules are captured in a parser object which can be applied to YAML-based config files to validate them.
In particular, this module enables a developer to:
-
define which fields are required, optional, or will be assigned default values if omitted;
-
declare types for those fields (e.g.,
str
,int
,date
,dict
,list
, etc.); -
run arbitrary functions to test the values in each field (e.g., regular expressions matches or file-path-existence checks).
After a config file is parsed, validated, and accepted, the returned object can be used to access the field values with some confidence that they exist, are of the expected type, and have whatever other properties the rules established. If a config file is rejected, an error explaining the violated rule is raised.
This package was inspired by the similar capability that argparse
brought to command-line argument parsing.
pip install yamlfig
The Python package is hosted on PyPI:
The source code, documentation, and issue tracker is hosted on GitHub:
As an example for when a developer might use yamlfig, consider developing a server that binds to an address and port. When any of a list of authorized users connects, it displays the server's name and the contents of a file. The following YAML file could act as the config file for such a server:
$ cat > quickstart_server.yaml
name: Simple Single-File Server
server:
port: 81
file_path: 'quickstart_shared_file.txt'
users:
- alice
- bob
- carol
The following script uses the yamlfig package to construct a parser for this example server. It instantiates a parser object and adds a set of rules that establish which fields and structures should be in a server's config file. Those rules establish what form those fields must take and what to do if they are missing. It then invokes this parser on a config file passed as a command line argument. Where an actual server script would then use those values to spin up a server, this script just demonstrates that the values can be accessed from the parsed object by printing them:
$ cat > quickstart_server.py
from __future__ import print_function
import sys
from yamlfig import YamlConfigParser, test
# Construct a parser for the server config file
confp = YamlConfigParser()
confp.add_rule('name', path_type=str)
confp.add_rule('description', optional=True)
confp.add_rule('server.addr', path_type=str, default='127.0.0.1', test=test.is_ipv4_address)
confp.add_rule('server.port', path_type=int, test=test.is_interval(1, 65535, include_upper=True))
confp.add_rule('file_path', path_type=str, test=test.is_file_path('exists', 'isfile'))
confp.add_rule('users', path_type=list)
confp.add_rule('users.*', test=test.is_regex('^[a-z][a-z0-9]*$'))
# Parse the config file
conf = confp.parse_file(sys.argv[1])
# Retrieve values from the conf object
print('conf.name = {0}'.format(repr(conf.name)))
print('conf.description = {0}'.format(repr(conf.description)))
print('conf.server.addr = {0}'.format(repr(conf.server.addr)))
print('conf.server.port = {0}'.format(repr(conf.server.port)))
print('conf.file_path = {0}'.format(repr(conf.file_path)))
for idx in conf.users:
print('conf.users[{0}] = {1}'.format(idx, repr(conf.users[idx])))
When we execute this server script on the above config file, it prints the following values:
$ python quickstart_server.py quickstart_server.yaml
conf.name = 'Simple Single-File Server'
conf.description = None
conf.server.addr = '127.0.0.1'
conf.server.port = 81
conf.file_path = 'quickstart_shared_file.txt'
conf.users[0] = 'alice'
conf.users[1] = 'bob'
conf.users[2] = 'carol'
Note how the fields and values printed came not only from the config file but also from the rules. Fields marked as optional or taking a default are present even though they were not in the config file. Also note how the values have the type and form required by the rules (e.g., an integer within a given interval, a string in IPv4 dotted-quad notation, and a list of usernames that all match a regular expression).
In this example, the config file specified a name
to display, a
port
to listen on within a server
block, a file_path
to a
file to share, and the list of accounts of authorized users
. The
rules added to the confp
parser ensure that those rules exist; they
also define rules for some paths not in the config file and configured
as optional
or taking a default
value:
-
a
description
field, if provided, would be displayed to the user, but it is optional; -
an
addr
field within theserver
block that, if provided, would set the binding IP address, but that takes a default of127.0.0.1
if omitted.
Additionally, the confp
parser verifies that the values present in
the config file are suitable for our intended use. Some of the values
are type-checked or otherwise validated:
-
the
addr
field, if provided, will be tested to confirm that it has the format of an IPv4 address (i.e., a string in dotted-quad notation); -
the
port
field will have its type checked to ensure it is anint
, and its value will be tested to confirm that in the range 1 to 65535; -
the
file_path
will be interpreted as a path to a filesystem object, and that object will be tested to confirm it exists and is a file (rather than a directory); -
all the account names in the
users
list will be tested against a regular expression to confirm they match the format of accounts on this particular system (i.e., they start with a lowercase letters followed by zero or more digits or lowercase letters).
Once the confp
parser is constructed and configured, its
parse_file
method is called on the config-file path given on the
command line, and it returns a parsed conf
object. To demonstrate
that the conf
object contains all the fields and values from the
config file merged with the optional fields and defaults from the
parser rules, it prints those fields and values.
The values in the conf
object returned by parse_file
have also
been type-checked and tested. Had the fields and values in the config
file not conformed to the rules of the confp
parser, a ParseError
exception would have been raised. Some examples:
-
If the
name
field were omitted:ParseError: quickstart_server.yaml: "name" is missing
-
If the
server
block contained a field calledthe_ip_address
that did not match any of the parser's rules:ParseError: quickstart_server.yaml: "server.the_ip_address" unexpected by parser
-
If the
port
field of theserver
block contained the string"eighty-one"
rather than the integer81
:ParseError: quickstart_server.yaml: "server.port" has type str not type int
-
If the
addr
field were present in the server block and had the value452.34.256.193
:ParseError: quickstart_server.yaml: "server.addr" failed test: 1st octet of "452.34.256.193" exceeds 255
-
If the
file_path
field had been the path to an existing directory namedsome_directory
instead of the path to an existing file:ParseError: quickstart_server.yaml: "file_path" failed test: "some_directory" is not a file
-
If the 3rd value of the
users
list been the display nameCarol C.
instead of the usernamecarol
(and noting zero-based indexing):ParseError: quickstart_server.yaml: "users.2" failed test: "Carol C." does not match /^[a-z][a-z0-9]*$/
The presence of such errors in the config file would have stopped
execution and provided a relatively informative explanation of which
rule failed and why. Because none of these errors were raised, a
developer has some assurance that the structure and values in the
conf
object meets their expectations.
Just as important as what happened in this example above is what
didn't happen. When the read_file
function returned the conf
object, it didn't raise a ParseError
exception. Since it executed
successfully, we know that all the parser assertions hold about which
fields must exist and what formats they take; the remaining code does
not need to perform such checks and error handling itself.
What yamlfig provides beyond a standard YAML parser is validation, specifically verification that a config file conforms to the various rules established for it. In this section, we introduce and describe these rules, and the various constraints that can be placed on a config file's structure and values.
The typical steps when using yamlfig are:
-
instantiate a
YamlConfigParser
object, which we usually callconfp
, -
configure it by using
add_rule
to add rules for each field we intend to control through a config file, -
invoke
parse_file
on a config file which either raises aParseError
or returns aYamlConfig
object, usually calledconf
, and -
use that
YamlConfig
in subsequent code, confident that its structure and values have already been validated.
The following script illustrates this typical pattern by using yamlfig. For the sake of the example, let's say we need a config file to drive how often a loop is run, which of two functions is called by the loop, and what parameter is passed to that function:
$ cat basic_usage.py
import sys
from yamlfig import YamlConfigParser
# 1. Instantiate a YamlConfigParser object (confp)
confp = YamlConfigParser()
# 2. Configure the parser by adding rules for each field
confp.add_rule('loop_count')
confp.add_rule('do_special_function')
confp.add_rule('function_parameter')
# 3. Invoke the parser on a config file (provided as an argument)
conf = confp.parse_file(sys.argv[1])
# 4. Use the YamlConfig object in subsequent code
for loop_index in range(conf.loop_count):
if conf.do_special_function:
special_function(conf.function_parameter)
else:
regular_function(conf.function_parameter)
One config file would cause the script to produce one behavior:
$ cat basic_config_1.yaml
loop_count: 7
do_special_function: yes
function_parameter: "a meerkat"
Running the script on basic_config_1.yaml
would cause
special_function
to be invoked 7 times, each with the parameter "a meerkat"
.
Another config file would cause the script to produce a different behavior:
$ cat basic_config_2.yaml
loop_count: 3
do_special_function: no
function_parameter: a pony
Running the script on basic_config_2.yaml
would cause
regular_function
to be invoked 3 times, each with the parameter "a pony"
.
Just as important is understanding the behavior of the script on a bad config file. The following config file is missing one of the three required fields:
$ cat basic_config_bad.yaml
loop_count: 3
function_parameter: 'a unicorn'
Running the script on basic_config_bad.yaml
exits unsuccessfully and
prints an exception:
Traceback (most recent call last):
File "basic_usage.py", line 22, in <module>
conf = confp.parse_file(sys.argv[1])
[...]
yamlfig.base.ParseError: basic_config_bad.yaml: "do_special_function" is missing
The error is raised within the confp.parse_file
function. All
verification and validation occurs as part of that function called in
step 3 of the pattern, so if it return successfully, the YamlConfig
object conforms with the parser rules.
The fundamental thing that yamlfig rules do is establish which fields should be in a config file and which fields should not.
confp.add_rule('dirname')
The first argument to add_rule
is the rule_path
. Every rule added
to a confp
object must have one, and—unless additional
modifiers make the field optional or take a default value—it is
an existence requirement for the field. Given the rule above, any
config file must contain a line such as:
dirname: /var/share/SomeApp/SharedDir
A config file without a dirname
field would generate a parse error.
Part of YAML's descriptive power comes from its ability to encode nested structures, like maps and lists, and yamlfig rules can describe constraints on that structure:
confp.add_rule('server.storage.dirname')
Rules use the '.'
character
to delimit fields within a nested structure. The rule above expects
there to be a server
block, within which is contained a storage
block, within which there is a dirname
field. The following config
would satisfy such a rule:
server:
storage:
dirname: /var/share/SomeApp/SharedDir
Such a rule path implicitly includes existence requirements for
server
and server.storage
. The existence of those paths would not
need to be explicitly required through separate rules, unless we
wanted to modify them (e.g., by making them optional or take defaults
as described in a later
section.
In this example, we need a config file to describe how a server's local storage cache is configured (i.e., where it is on the filesystem, how big it can grow, and what permissions the cache files have).
confp.add_rule('server.storage.dirname')
confp.add_rule('server.storage.maxsize')
confp.add_rule('server.storage.umask')
In combination, these three rules describe the structure that the
server.storage
block must have (i.e., three fields with the names
dirname
, maxsize
, and umask
).
The following config file would be accepted by this parser:
server:
storage:
dirname: /var/share/SomeApp/SharedDir
maxsize: 10GB
umask: 0644
Any field or path for which there is no matching rule will raise a parse error. In a sense, the yamlfig field-existence validation is deny-by-default. We do not need to do anything specific to assert that a field does not exist; just don't add an existence requirement.
A rule path can contain wildcards. For the sake of this example, we
need a config file to specify upload paths for each of one or more
users. A server.upload_paths
block will map from username to their
corresponding upload directory, as in the following example:
server:
upload_paths:
alice: /home/alice/uploads
bob: /home/bob/public
Since we do not want to hardcode the usernames in the parser, we can use a wildcard rule to accept one or more fields within a block without specifying the specific field names:
confp.add_rule('server.upload_paths.*')
Such a rule asserts that the server.upload_paths
block contains
non-empty substructure (i.e., it is a block), but not the specific
field names within the substructure. In the above example config
file, the wildcard woudl match both alice
and bob
, even though
neither are explicitly listed field paths.
A new user could be added with their own upload path, and the same parser would accept the config file:
server:
upload_paths:
alice: /home/alice/uploads
bob: /home/bob/public
carol: /home/carol/tmp
Note that a wildcard rule must match one or more fields, not zero.
If the server.upload_paths
block were empty, the config file would
raise an error: "server.upload_paths" must contain at least one field
. A later
segment
describes how the optional
and default
flags can be used with
wildcards to implement a zero-or-more match.
Also note that partial wildcard matches are not currently supported.
A path like server.upload_paths.user-*
intending to accept fields
like user-alice
and user-bob
would instead raise an error. As
described in a different later
segment, one way to implement such a
check would be to write a function that tests every field within a
block against a regular expression, and then specify that function as
a test function for the server.upload_paths
block.
For the sake of this example, a server hosts one or more projects,
each of which has a directory of static web pages associated with it,
and a backend database. Our configuration file maps from one or more
arbitrary project names (e.g., ProjectX
and meerkat_works
) to
blocks that contain precisely three datapoints (1) a path to a
directory of webpages, (2) a path or URL to a database, and (3) the
type of the database (e.g., sqlite
, mysql
, or mongodb
).
The following config file gives an example of this structure:
projects:
ProjectX:
webpath: /home/alice/projx/html
dbpath: /home/alice/projx/project.db
dbtype: sqlite
meerkat_works:
webpath: /home/bob/public/meerkat/www
dbpath: mongodb://192.168.1.200:27017
dbtype: mongodb
With yamlfig, we can specify wildcards on paths while still requiring that any fields matching the wildcard have a required, fixed-field substructure. The following parser will accept one or more blocks, each corresponding to a project name, but every one must have the three required fields:
confp.add_rule('projects.*.webpath')
confp.add_rule('projects.*.dbpath')
confp.add_rule('projects.*.dbtype')
These rules implicitly assert that the projects
block exists and
contains one or more sub-blocks, with no restriction on their field
names. The rules explicitly assert that each one of those sub-blocks
must contain exactly three fields: webpath
, dbpath
, and dbtype
.
This parser would accept the example config file above.
If a project sub-block were missing one of the three required fields or had an extra field, an error would have been raised.
YAML's nested structure supports not only the mappings described above
but also lists. Technically, YAML offers a whole lot of different
nesting types (e.g., omap
, pairs
, etc.) but our python parser
represents them all as either dict
or list
objects, with mappings
represented as dict
objects and lists as lists
. To most easily
and succinctly accommodate both mappings and lists, yamlfig
effectively treats lists as a very specific kind of mapping, where
each field is a list index and each value is the item in the list.
Consider a config file where a users
block contains a list of one or
more authorized users:
users:
- alice
- bob
- carol
The following rule would accept such a users
block:
confp.add_rule('users.*')
Note that the wildcard rule by itself just ensure that there are subfields, not that they take the form of a list as opposed to a mapping. That same parser would accept a config file with a mapping:
users:
alice: Alice A.
bob: Bob B.
carol: Carol C.
These two config files—the list version and the mapping
version—have very different structures, and a program would
likely be expecting one and not the other. To ensure that a rule with
a wildcard matches only a list (and not a map) or only a map (and not
a list), we would need to use type checking, as described in a later
segment, to assert that
the type of the block is either list
or dict
respectively.
Also note that when accessing list values parsed into a conf
object,
we need to be aware of some difference in their behavior from that of
a standard python list, as described in the Handling Parsed
Objects section.
While not a typical occurrence, parser rules can be configured to
ensure that a list has a specific number of elements. The following
rules would accept a list of length 2 by explicitly requiring fields
named 0
and 1
:
confp.add_rule('network.route.0')
confp.add_rule('network.route.1')
For the sake of the example, perhaps the application must have two network routes, a primary and a secondary.
As noted above, yamlfig treats lists as mappings from numeric fields to values, so the parser would accept the following config:
network:
route:
- 192.0.2.1
- 198.51.100.1
As a side note, the same two rules would accept a config in which the
route
block contained a mapping from numeric string fields (i.e.,
"0"
and "1"
) to the two IPv4 addresses. As we keep stressing, the
rules simply treat lists as mappings from numeric fields to the list
elements. To differentiate a list from a mapping, we would need to
use type checking, as described in a later
segment
As described in the Basic Usage section, to parse a
config file, a parser's parse_file
method would be called with the
name of the file:
conf = confp.parse_file(conffile)
Assuming the parsing and validation succeeds, the conf
object would
have type YamlConfig
or YamlConfigList
, depending on whether the
root-level YAML object in the config file is a mapping or a list.
Typically, a YAML-formatted config file will have a mapping as its
root-level structure, and so we will consider that common case first.
Throughout this section, assume that we have successfully parsed the
following config file into a YamlConfig
object named conf
:
dirname: /var/share/SomeApp/SharedDir
server:
projects:
ProjectX:
webpath: /home/alice/projx/html
dbpath: /home/alice/projx/project.db
dbtype: sqlite
meerkat_works:
webpath: /home/bob/public/meerkat/www
dbpath: mongodb://192.168.1.200:27017
dbtype: mongodb
users:
- alice
- bob
- carol
- dave
While the focus of this section is on accessing the conf
object
after confp
successfully parses and validates the config file, for
the sake of completeness, the following rules would configure a
parser that accepts this file:
confp = YamlConfigParser()
confp.add_rule('dirname')
confp.add_rule('server.projects.*.webpath')
confp.add_rule('server.projects.*.dbpath')
confp.add_rule('server.projects.*.dbtype')
confp.add_rule('users.*')
Fields in a config file can be accessed as attributes of the
YamlConfig
object.
conf.dirname # '/var/share/SomeApp/SharedDir'
If an attribute corresponds to a block in a config file, it will
return that block as a YamlConfig
or YamlConfigList
object.
conf.server # <YamlConfig object at 0x[...]>
conf.users # <YamlConfigList object at 0x[...]>
As such, attributes can be strung together in a sequence:
conf.server.projects.ProjectX.webpath # '/home/alice/projx/html'
conf.server.projects.ProjectX.dbpath # '/home/alice/projx/project.db'
conf.server.projects.ProjectX.dbtype # 'sqlite'
conf.server.projects.meerkat_works.dbtype # 'mongodb'
Note that to be accessed as an attribute, a field must be a valid Python attribute (e.g., must be a string, cannot start with a number, etc.).
A values stored in a YamlConfig
object can also be accessed via
index lookup.
conf.server.projects['ProjectX'].dbtype # 'sqlite'
proj = 'ProjectX'
conf.server.projects[proj].dbtype # 'sqlite'
conf['server']['projects']['ProjectX']['dbtype'] # 'sqlite'
path = ['server', 'projects', 'ProjectX', 'dbtype']
functools.reduce(lambda d, idx: d[idx], path, conf) # 'sqlite'
Index lookups must be used to access the elements of a YamlConfigList
since attributes cannot be numbers.
conf.users[0] # 'alice'
conf.users[1] # 'bob'
conf.users[2] # 'carol'
conf.users[3] # 'carol'
conf.users[-1] # 'dave'
conf.users[-2] # 'carol'
In a departure from standard python lists, a YamlConfigList
object
will translate to or from a string representation of an index as
needed.
conf.users["1"] # 'bob'
conf.users['-2'] # 'carol'
Once again, this is to allow—as much as possible—lists to be treated like mappings from the list indexes to the list elements.
As with dict
and list
objects, we can see how many elements are
in a YamlConfig
and YamlConfigList
object by querying their
length.
len(conf) # 3
len(conf.server) # 1
len(conf.server.projects) # 2
len(conf.server.projects.ProjectX) # 3
len(conf.server.projects.meerkat_works) # 3
len(conf.users) # 4
Iterating on a YamlConfig
object will return the field names
contained within the block, like what we would get from iterating on
a dict
object:
list(conf) # ['dirname', 'server', 'users']
list(conf.server) # ['projects']
list(conf.server.projects) # ['ProjectX', 'meerkat_works']
list(conf.server.projects.ProjectX) # ['webpath', 'dbpath', 'dbtype']
list(conf.server.projects.meerkat_works) # ['webpath', 'dbpath', 'dbtype']
Note that the order in which YamlConfig
fields are returned is the
order the rules were added to the parser, not the order in which the
rules appear in the config file. When a single parser rule matches
multiple fields (i.e., a wildcard rule), the fields are returned in
arbitrary order. Note however, that around Python 3.6 and Python 3.7,
they have started being returned in the order they appear in the
config file, likely due to dict
objects beginning to return keys in
the order they were inserted.
Iterating on a YamlConfigList
object is significantly different
from iterating on a python list. In particular, it will return the
list of index values as strings, not the actual list values:
list(conf.users) # ['0','1','2','3']
As noted previously, a YamlConfigList
treats lists less like lists
per-se and more like mappings from zero-based, sequential, numeric
indexes to values. As such, its iterator returns field names that can
be used as indexes to look up values, not the values themselves.
This behavior is likely unexpected at first and arguably
controversial, but was chosen for greater overall simplicity. A lot
of code can iterate over fields, descend into blocks, and so on much
more simply, when it does not need to treat YamlConfigList
objects
as a special case, separate from YamlConfig
objects.
To get the values rather than the indexes, we recommend list comprehension:
[conf.users[idx] for idx in conf.users] # ['alice', 'bob', 'carol', 'dave']
Unlike the fields of a YamlConfig
, indexes of a YamlConfigList
will be returned in a specific order: sequential and increasing from
a base of zero.
Having examined how to configure a parser to require certain fields and structure, and how values will be represented in the parsed object, we introduce ways to make rules optional, take default values, and have the parser ignore their substructure.
When instantiating and adding a new rule, we can specify optional=True
:
confp.add_rule('name')
confp.add_rule('description', optional=True)
The above parser would require a name
field but not a description
field, as in the following config file:
name: Simple Single-File Server
The parser will accept the file, create a description
field, and assign
it the value None
.
conf.name # 'Simple Single-File Server'
conf.description # None
A program acting on the conf
object can assume that the optional
field exists, but it will have the value None
if it was not present
in the config file (or if it was explicitly assigned the value None
since the two are treated as equivalent).
A rule representing a nested block can be marked optional and still
have substructure with required fields. For the sake of example, a
server requires three files in order to encrypt its communications
using SSL. If a server.ssl
block is present in the config file,
those files must be provided, and the server will use SSL. If the
block is omitted, the config file should still be accepted, but the
server will fall back to unencrypted communications.
The following parser is configured with an optional server.ssl
block
that, if it exists, must have three specific fields:
confp.add_rule('server.addr')
confp.add_rule('server.port')
confp.add_rule('server.ssl', optional=True)
confp.add_rule('server.ssl.key')
confp.add_rule('server.ssl.cert')
confp.add_rule('server.ssl.chain')
In the following config, the optional ssl
block and its substructure
have been omitted:
server:
addr: 127.0.0.1
port: 81
Since the block was not included, the ssl
field is present in the
conf
object but assigned a value of None.
conf.server.ssl # None
In the following config, the optional ssl
block and its substructure
have been included:
server:
addr: 127.0.0.1
port: 81
ssl:
key: /etc/ssl/privkey.pem
cert: /etc/ssl/cert.pem
chain: /etc/ssl/full_chain.pem
Since the block was included, its substructure was parsed and
validated. The conf
object includes the block and its substructure.
conf.server.ssl # <YamlConfig object at 0x[...]>
conf.server.ssl.key # '/etc/ssl/privkey.pem'
conf.server.ssl.cert # '/etc/ssl/cert.pem'
conf.server.ssl.chain # '/etc/ssl/full_chain.pem'
The existence requirements on the substructure will only be checked
and enforced if the optional field is present. In the following
config, the optional ssl
block is present, but it is missing one of
its required fields:
server:
addr: 127.0.0.1
port: 81
ssl:
key: /etc/ssl/privkey.pem
# cert: /etc/ssl/cert.pem
chain: /etc/ssl/full_chain.pem
When parse_file
is invoked on this config file, a ParseError
is
raised: "server.ssl.cert" is missing
.
When instantiating and adding a new rule, we can specify a default
.
confp.add_rule('server.addr', default='127.0.0.1')
confp.add_rule('server.port')
In the following config, the default rule has been omitted:
server:
port: 81
The parser will accept the file, create not only a port
field but
also an addr
field within the server
block, and since the addr
field does not appear in the config, it will assign the default value
(127.0.0.1
) to the field.
conf.server.port # 81
conf.server.addr # '127.0.0.1'
The following config rule will provide an entire server block if none is specified in the config file:
confp.add_rule('server', default={'addr': '127.0.0.1', port: 81})
Providing such structure is possible, but the above rule would
generate a ParseError
unless it was accompanied by rules to accept
the server.addr
and server.port
paths. With only the rule above,
a config file that triggered the default would raise a ParseError
:
"server.addr" unexpected by parser
.
We need to add rules to prepare the parser for the substructure, as in the following parser that accompanies the default rule with two more:
confp.add_rule('server', default={'addr': '127.0.0.1', port: 81})
confp.add_rule('server.addr')
confp.add_rule('server.port')
With these two additional rules, a config file will be accepted with
the default values if server
is omitted, and it will require those
two values be present if a server
block is present. In both cases,
once parsing is successful, the program can assume that
conf.server.addr
and conf.server.port
exist.
If we really did not want to validate the fields of the default substructure, rather than adding rules for the fields, we could mark the block as no-follow as described in a later segment.
The optional
and default
parameters to add_rule
are mutually
exclusive; if both are specified, an error will be raised.
Essentially, optional=True
acts like a default rule for which the
default value is None
. In fact, setting optional=True
is the only
way for a missing field to be assigned a value of None
, since
setting default=None
is a no-op. A value of None
for default
actually signals that no default has been specified, so the field is
still required.
It is unclear what the semantics would even be for an optional rule that also takes a default, so the pairing is just not allowed.
In the following parser configuration, the server
block takes a default,
while the server.ssl
block is optional:
confp.add_rule('server', default={'addr': '127.0.0.1', 'port': 81})
confp.add_rule('server.addr')
confp.add_rule('server.port')
confp.add_rule('server.ssl', optional=True)
confp.add_rule('server.ssl.key')
confp.add_rule('server.ssl.cert')
confp.add_rule('server.ssl.chain')
We can see what will happen in the following config, where the
server
field is omitted. Note that this config file uses the
convention that leaving a field value blank assigns it a value of
None
(or null
in YAML terms), and that causes it to be treated as
omitted by yamlfig:
server:
The parser above will accept this config file. Since the server
block has been omitted, it will substitute its default value. Since
the server.addr
and server.port
fields are provided by the
default, they will pass the rules requiring their existence. The
server.ssl
field has not been provided by the default, but since
it is flagged as optional, the field will be created and assigned a
value of None:
conf.server # <YamlConfig object at 0x[...]>
conf.server.addr # '127.0.0.1'
conf.server.port # 81
conf.server.ssl # None
The reverse is also true. Default fields can be included in the substructure of an optional field, and they will take the default values if they are omitted from the config but the optional block is included. Other combinations work as well (e.g., default fields within default blocks; optional fields within optional blocks; optional fields within default blocks within optional blocks; etc.).
If it helps, we can think of optional and default flags being handled
from the top down in a cascade. If a parent field is omitted, it will
be checked for optional
or default
flags first. If it is
optional, the field will be created with a None
value and the
parsing will move on. If it takes a default, the field will be
created with the default value or substructure, and the parser will
descend into that substructure, checking those fields and values before
moving on. The parser will only encounter child fields and values
after the parent's optional or default nature has been handled.
As noted earlier, a wildcard rule path requires that a config file have one or more fields matching the path. By default, a wildcard rule will raise an error if there are no fields matching it, but there are times when we want to accept zero-or-more matches.
The following rules configure a parser for cases where we want a block with zero or more subfields:
confp.add_rule('server.upload_paths', default={})
confp.add_rule('server.upload_paths.*', optional=True)
The optional
flag on the wildcard path will cause the parser to
allow the upload_paths
block to contain no fields. The default on
the upload_paths
field will create that empty block if the field is
null. The following config file would be accepted by this parser:
server:
upload_paths:
By leaving upload_paths
null, we signal that it has been explicitly
omitted, so it is given its default value (i.e., an empty mapping).
Then, since the wildcard path for the fields within upload_paths
is
flagged as optional, that rule will be satisfied with zero fields.
The YamlConfig
object would exist but contain zero fields:
conf.server.upload_paths # <YamlConfig object at 0x[...]>
len(conf.server.upload_paths) # 0
For completeness sake, the same parser would also accept a config
file with one or more fields within upload_paths
:
server:
upload_paths:
alice: /home/alice/uploads
bob: /home/bob/public
In this case, the YamlConfig
object would contain two fields:
conf.server.upload_paths # <YamlConfig object at 0x[...]>
len(conf.server.upload_paths) # 2
set(conf.server.upload_paths) # {'alice','bob'}
This pattern—with the block taking an empty substructure as the default and the wildcard rule flagged as optional—is the recommended way to implement a parser that accepts zero-or-more fields or list elements.
In some cases, we want to stop a yamlfig parser from attempting to validate a substructure, either because the program is designed to handle whatever is beneath that value or, more often the case, the actual structure follows a complicated syntax, but the program will be passing that structure to another package, and it has its own functions for validating the input.
For example, consider an example where a program needs to pull back a list of projects from a MongoDB database. The following config file provides values that might be needed to (1) access the database, (2) reference the specific collection within the database, and (3) filter the results to only a subset of all projects, using a MongoDB query:
mongodburl: mongodb://192.168.1.200:27017/
collection: projects
filterquery: { 'is_private': { '$ne': true } }
All three fields must exist, but the filterquery
field contains a
MongoDB query as its substructure. MongoDB queries can be expressed
as JSON objects, and YAML syntax is a superset of JSON, so the query
can be expressed as JSON/YAML right within the YAML config file.
By default, the yamlfig parser will try to validate that object
(i.e., check whether the paths filterquery.is_private
and
filterquery.is_private['$ne']
are expected by the parser).
Configuring a confp
parser to correctly validate the syntax of an
arbitrary MongoDB query is impossible and an unnecessary waste of
complexity. As soon as the script hands the query off to MongoDB, it
is going to do a much better job of validating it.
The following parser is configured to accept the above config file:
confp.add_rule('mongodburl')
confp.add_rule('collection')
confp.add_rule('filterquery', nofollow=True)
This parser will require that a filterquery
field exists along with
the mongodburl
and collection
fields, but the nofollow
argument
ensures that it will not descend into the substructure within the
filterquery
field. No additional validation of that substructure
will take place.
The value at conf.filterquery
is a standard python dict
which can be
passed to a MongoDB find
command as-is.
conf.filterquery # {'is_private': {'$ne': True}}
As an aside, note that until this example, we have been using YAML block-structure syntax rather than JSON syntax, but there is no difference between the two formats once parsed. The config file above could have been written equivalently as follows:
mongodburl: mongodb://192.168.1.200:27017
collection: projects
filterquery:
is_private:
"$ne": true
A distinct alternative would have been to encode the MongoDB query object as a string, as in the following line:
filterquery: "{ is_private: { $ne: { true }}}"`
While a viable alternative, there are benefits to not doing so. By storing the query object as a query object, we actually do perform some syntax checking at parse time, before handing it off to MongoDB. We ensure that the brackets are balanced and the JSON is legal. We also get whatever syntax highlighting our editor provides to YAML/JSON. A string would simply be treated as a string by the yamlfig parser, and we would eventually have to invoke a JSON parser ourselves.
The nofollow
parameter really affects the handling of the value not
the field, whereas optional
and default
are parameters that affect
the handling of the field (i.e., what to do if it is omitted).
If a field is omitted from a config, and if its path is marked in the
parser as both optional and no-follow, the field will be created and
assigned the value None
. If it were not optional, an error would be
raised. Since the value None
is terminal and has no substructure,
being marked no-follow has little effect.
If a field is omitted from a config, and if its path is marked in the
parser as no-follow and taking a default, the field will be created
and the default value will be substituted. If the default value has
substructure (i.e., it is a dict
or a list
), then the no-follow
marking would apply and no additional validation would be performed by
the parser on that substructure.
A parser configuration such as the following would raise an error:
confp.add_rule('filterquery', nofollow=True)
confp.add_rule('filterquery.is_private')
The no-follow condition on a path means that no rules on descendant
paths will ever be checked or validated, so we prevent such rules from
being added. For this example, a ValueError
would be raised
explaining that "filterquery.is_private" is a descendant of a no-follow rule
.
Within the yamlfig parser, after establishing that every required field exists, that every optional or default field has been handled, and that there are no unexpected fields, the parser's next step is to check that any type assertions on the values for each field are satisfied.
When instantiating and adding a new rule, we can specify a path_type
.
confp.add_rule('server.addr', path_type=str)
confp.add_rule('server.port', path_type=int)
In addition to requiring that the server
block contains an addr
field and a port
field, these rules will further check that the
values are instances of the given path_type
types.
The following config has a null in the addr
field:
server:
addr: ~
port: 81
The above parser would raise an error: "server.addr" has type NoneType not type str
.
Likewise, the following config has a string in the port field:
server:
addr: 127.0.0.1
port: "81"
The above parser would raise an error: "server.port" has type str not type int
.
The type that a value takes is determined by the underlying raw-YAML
parser that yamlfig uses. By defalt, we use SafeLoader
within
PyYAML
. It recognizes the
following types:
bool
str
unicode
(in Python 2, when the value contains non-ASCII characters)int
long
(in Python 2, when the value is larger thansys.maxint
)float
date
(in thedatetime
package)datetime
(in thedatetime
package)dict
(for mappings and mapping-like tags)list
(for lists and list-like tags)NoneType
(i.e.,path_type=type(None)
)
Any of those types could arise in a config and be accepted or rejected
by a path_type
argument. Additionally, it is possible to replace
SafeLoader
with a different YAML parsing class, in which case the
set of types would depend on what types it constructed.
In the following config, the timeout
field will be parsed as a float:
server:
timeout: 1.2
But in the following config, the timeout
value will be parsed as an int:
server:
timeout: 1
Assuming the underlying server wants a float but handles the
conversion of an int all by itself, we don't really want to force a
user to add spurious decimal points (e.g., changing 1
to 1.
will
ensure the parser returns a float) just to consistently achieve a
single type across all configs.
The following parser configuration rule will accept a timeout
that
is either an int
or a float
:
confp.add_rule('server.timeout', path_type=(int, float))
By specifying a tuple of types, we can direct yamlfig to accept values that are instances of either type.
This feature was a lot more urgent in Python 2, where we almost always
wanted a string to be checked against (str, unicode)
, so that the
appearance of a word with an accent or umlaut in a descriptive string
wouldn't suddenly cause our config file to be rejected. Things have
gotten calmer with Python 3 (e.g., str
vs unicode
and int
vs
long
are no longer issues), but union types do still arise (e.g.,
int
vs float
).
Consider this parser configured to accept mappings from project names to descriptions:
confp.add_rule('projects', path_type=dict)
confp.add_rule('projects.*')
Contrast it with this parser configured to accept lists of authorized user names:
confp.add_rule('users', path_type=list)
confp.add_rule('users.*')
In both cases, the wildcard rule accepts one-or-more arbitrary fields
within the block, but as noted in an earlier
segment,
the wildcard does not distinguish a map from a list. The
path_type=dict
constraint is what ensures that the first parser
accepts config files with a map, like this:
projects:
ProjectX: "Project X is an eXtreme project (for more info talk to Alice)"
meerkat_works: "Bob's not-quite skunkworks project"
The path_type=list
constraint is what ensures that the second parser
accepts config files with a list, like this:
users:
- alice
- bob
- carol
Since few programs are written to expect either a mapping or a list, we typically want to use type checking to ensure that a config-file block contains the expected structure.
While we have so far considered YAML files with a map structure at the root level, a YAML file could also be a list:
- addr: 192.0.2.200
port: 81
- addr: 192.0.2.201
port: 81
- addr: 198.51.100.15
port: 8080
- addr: 203.0.113.130
port: 8080
In this example, these address-port pairs might be a list of mirrors, ordered by proximity. The following parser is configured to check that the YAML file itself is a list, and then that each element of the list has the proper substructure:
confp = YamlConfigParser(path_type=list)
confp.add_rule('*.addr', path_type=str)
confp.add_rule('*.port', path_type=int)
After reading and validating the above config file, this parser
returns a YamlConfigList
object:
len(conf) # 4
conf[0].addr # '192.0.2.200'
conf[0].port # 81
Note that in this example, we actually included the instantiation of
the YamlConfigParser
as confp
. All our previous examples (after
Basic Usage) assumed that step. But when we are
making assertions about the top-level object parsed from the config
file, those are configured as part of the YamlConfigParser
instantiation.
Down deep, every rule that gets added to the parser is of type
YamlConfigRule
. The YamlConfigParser
class inherits from
YamlConfigRule
, and any arguments are used to validate the
root-level object rather than any particular field or path within the
object. The primary difference between the root YamlConfigParser
object and the YamlConfigRule
objects that are added to it is that
the YamlConfigParser
cannot have a rule_path
while the other
objects must. Additionally, the root-level object cannot be optional
or take a default. (It can be flagged no-follow, though.)
A yamlfig parser will not accept a config file without any substructure. Technically, this is a valid YAML file:
42
A standard YAML parser will parsed it as an int
. However,
yamlfig will raise an error: config is a(n) int but a record or list is expected
.
Honestly, if our program must accept config files consisting of a single value, yamlfig might not be the right tool for the situation. If we still desired to make a go of it, we could nest that value in a single-field mapping, like so:
number: 42
The following parser would accept that config file, with a single field, the value of which is a list:
confp = YamlConfigParser(path_type=dict)
confp.add_rule('number', path_type=int)
Even more concisely, we could nest the value in a singleton list:
- 42
The following parser would accept such a config file:
confp = YamlConfigParser(path_type=list)
confp.add_rule('0', path_type=int)
Note the space between -
and 42
. Without it, the singleton list
collapses back to a single (negative) integer:
-42
Such a single value would not be accepted.
While type checking helps validate the values in a config file, we
often want to place additional constraints on those values. For
instance, we might want a value not only to be an int
but to fall
within a particular range. We might want another value not only to be
a str
, but also to match a regular expression. We might want a
third value not only to be a str
but also to point to an existing
file.
When adding a rule to a parser, we can specify a test function using
the test
argument, to perform additional checking of values. The
yamlfig.test
package contains a variety of pre-packaged test
functions for some common validation scenarios.
This parser rule will constrain username
to start with a lower case
letter and be followed by zero or more lowercase letters or numbers:
confp.add_rule('username', test=test.is_regex('^[a-z][a-z0-9]*$'))
This config file would be accepted by such a parser:
username: 'carol57'
This config file would be rejected:
username: 'Carol C.'
The parse error would include the explanation "username" failed test: "Carol C." does not match /^[a-z][a-z0-9*$/
.
This parser rule will constrain the addr
field of the server
block to
be a valid dotted-quad IPv4 address:
confp.add_rule('server.addr', test=test.is_ipv4_address)
The yamlfig.test
module that contains various common validation
tests has been directly imported as the test
object in the examples
of this documentation, but would otherwise be accessed as
yamlfig.test
(e.g., test=yamlfig.test.is_regex('^[a-z][a-z0-9]*$')
).
The following test functions are available in the yamlfig.test
module:
-
is_interval(lower, upper)
verifies that the value is within the range defined by the lower and upper bounds; -
is_regex(regex, invert=False)
verifies that the value matches the regular expression (or does not match it, if inverted); -
is_ipv4_address
verifies that the value is an IPv4 address in dotted-quad notation; -
is_domain_name
verifies that the value conforms to the specification of a DNS domain name (which, note, is a looser constraint than that it be an actual operating and reachable domain name); -
is_email_address
verifies that the value (roughly) conforms to the specification of an email address; -
is_url
verifies that the value (roughly) conforms to the specification for URLs; -
is_file_path(*ostests)
takes one or more strings corresponding to properties of filesystem objects, interprets the value as a filesystem path, and verifies that the path satisfies all of the listed properties. Properties include:'exists'
and'!exists'
: the path exists (or is not);'isdir'
and'!isdir'
: the path is a directory (or is not);'isfile'
and'!isfile'
: the path is a file (or is not);'islink'
and'!islink'
: the path is a symlink (or is not);'ismount'
and'!ismount'
: the path is a mount point (or is not).
Note that for higher-order functions (i.e., the ones that return the
test functions suitable for test
), not all optional parameters are
shown and described. Check the help documentation for each function
for additional detail on usage and options.
The test
parameter to add_rule
takes a function with three
parameters: conf
, path
, and value
. It signals acceptance of the
value by returning None, and rejection by returning a string
explaining what caused the failure.
Some of the test functions packaged within yamlfig, like
test.is_ipv4_address
, directly match that specification. Others,
like test.is_regex
, return a function that matches the specification
based on their arguments.
In an earlier
segment,
we explained that rule paths could not contain partial wildcards
(e.g., user-*
to require that all fields start with a particular
prefix). However, test functions do offer a way to recognize such
properties and more. Consider a config file in which a block must
contain a default
field, and can optionally contain zero or more
fields that must all match a partial wildcard like user-*
.
Test functions offer a way to implement any test that we can write as a Python function:
def has_default_and_user_fields(conf, path, value):
if not 'default' in value:
return '"default" field is missing'
for field in value:
if field != 'default' and not field.startswith('user-'):
return '"{0}" is neither "default" nor starts with "user-"'.format(field)
return None
confp = YamlConfigParser()
confp.add_rule('uploads', test=has_default_and_user_fields)
confp.add_rule('uploads.*', path_type=str, optional=True)
The following config file would be accepted by this parser:
uploads:
default: /var/share/SomeApp/uploads
user-alice: /home/alice/uploads
user-bob: /home/bob/public
The following config file would be rejected by this parser:
uploads:
user-alice: /home/alice/uploads
user-bob: /home/bob/public
The parse error would include the explanation "uploads" test failed: "default" field is missing
.
Of the three parameters (i.e., conf
, path
, and value
), all of
the packaged test functions depend only on the value
, and that will
typically be the case. By providing the entire conf
object as well
as the path
to the value being verified, yamlfig enables the
test to evaluate the value in the context of the rest of the config
file if necessary.
-
Field names with leading underscores – While accessing
YamlConfig
fields as attributes is convenient, the drawback is that any field names that start with a leading underscore risk colliding with the methods and attributes that implement the class. Consequently, if the parser encounters any field that start with'_'
, a warning will be raised. The warning can be suppressed by settingyamlfig.print_underscore_warning.off = True
. As with non-string field names, we can always look up a field with leading underscores via index lookup (e.g.,conf["_field"]
rather thanconf._field
), but the warning is intended to make us aware of the possibility for collision. -
References to
transform
functions in code and documentation – The API for theYamlConfigRule
class and theYamlBaseConfig
classes expose references to atransform
function or the ability to invokedo_transform
on the container object. The ability to specify a transform is planned (and described in more detail below). As the code indicates, work on this feature was already underway when this version was released, but consider it untested, incompletely documented, and subject to change.
Several features are already on our list of things we would like to or have started to implement:
-
Allow standard fielded rules to co-exist alongside wildcard rules, with the standard rule taking precedence if it matches and the wildcard being used as a catch-all. We probably still want to disallow partial wildcards since (a) they can already be handled with test functions, and (b) they would raise the possibility of allowing multiple partial wildcard rules attached to the same parent path, and that would raise all sorts of ambiguity about what to do if multiple rules match the same field.
-
Allow a user to more easily specify a constraint on field names, for instance a
field_type
to do the same type checking on a field thatpath_type
does on the value, and/or afield_regex
to specify a pattern that the field must match. Currently these are possible, but would require the user to implement their own rule test, as described above. -
Add options to
test.is_file_path
that (1) allow the user to specify a directory from which all relative paths are resolved, and (2) allow the user to specify a path intoconf
where such a base directory would be stored. These would enable support for a config file where onehomepath
field specifies where the program willchdir
to, and then all the other paths (e.g.,dbfile
orhtmldir
are specified relative tohomepath
). -
Extend the
rule_path
specification to allow us to express field types that are not strings or are strings that include 'special characters' like whitespace, the delimiter (.
), or the wildcard (*
). Right now, we're thinking of using square brackets in a rule path, so that the string resembles the path that would be used to access the value once parsed (e.g.,rule_path="dbhosts['192.0.2.1'].port"
would indicate the config file had a structure where adbhosts
block contained a field field named192.0.2.1
which maps to a sub-block that has aport
field). -
Implement a
test.is_in_choiceset
which verifies that the value is one of the configured options or choices. So, if a rule for adbtype
path hadtest=test.is_in_choiceset(['sqlite', 'mysql', 'mongodb'])
, it would verify that thedbtype
value took one of those values, returning an explanatory error message if not. -
Extend
YamlConfigParser
with a function to write a config-file template (or and actual config file if provided with aconf
object) to use for the values. For each rule, it is already possible to specify adesc
describing the purpose of the path, and anexample
value. These values could be incorporated into the config-file template, making it somewhat self documented, and making it easier for a program to provide its users with a template. In truth, we would probably want to implement this as aRepresenter
class that inherits from and extendsSafeDumper
, but with that class invoked by something likeconf.write_file(filename, conf=None)
. -
Extend
YamlConfigRule
with support for transformations. A goal for yamlfig was to gather into one package all of the things that we find ourselves doing over and over again when we read in a config file (e.g., checking whether fields exist, that they have the right types, and that they meet various other conditions). Another thing we do at this stage is converting them to the objects that we really want to use in our program. For instance, we don't really want the path to the log file, we want the open filehandle to it; we don't really want the IPv4 address in dotted-quad notation, we want theIPv4Address
object that we can construct with it. Some initial groundwork for such transformations already exists within the code, but it needs to be built out and tested (lots and lots of testing, especially the interaction with the write-out-configs extension described above).
This list is neither exhaustive nor a promise of what is certain to come. Other suggestions are also welcome, of course, too.
We welcome reports of issues and other contributions through our package's page on GitHub:
Note that this is our first open-source project, and it was shared in part so that we could get more experience with the standard tools and workflows. We aim to respond to any issues, requests, or other feedback promptly and professionally, but some understanding may be required since we are learning as we go.