-
Notifications
You must be signed in to change notification settings - Fork 25
Component Configuration Management UFS Weather Model
Many components of UFS Applications are compiled Fortran codes and are configured with Fortran namelists – a set of key/value pairs of data organized into sections. These namelists are often generated by the configuration layer or run-time scripts of an application, depending on the data that is required. In this stage of development, we will focus on the needs of the UFS Weather Model, but keep in mind the requirements of other UFS components.
There is a clear "value add" if the UFS Apps can pull their “single source of information” from a set of namelists (and other config files) that have been tested via the regression tests of the component code. The goal is to provide unified, Python-based tools that would allow the UFS Apps to couple their namelist generation steps directly to those tested in the regression tests. This should reduce the duplication of effort (and information) kept in the workflow scripts and in the regression tests. It should also reduce the amount of time spent in managing changes to the configuration files when updating to newer versions of the UFS Weather Model.
In the next sections, the management of namelists for the UFS Weather Model will be outlined. The regression test suite in the model repository, and each of the UFS Apps manages the namelist differently.
The regression test suite in the model is a bash-based system that calls on the atparse utility to fill in flat namelist templates such as tests/parm/control.nml.IN from values exported as environment variables. There is a set of ~26 namelist templates at the time of writing. The environment variables are set in corresponding bash test files like tests/control. The template is set to use the @[] string to indicate that the value should be replaced with a value set by an environment variable. The same templating mechanism is use for generating a whole variety of configuration files slurped in by the model at run time, depending on the specific application, configuration, and coupling status of the model. There is a need for each of these to be managed, and will require separate discussions about their requirements individually.
What happens if an environment variable does not exist?
The atparse
utility contains a test function, but it is unclear whether it is routinely run as a part of test automation.
The HAFS repository maintains 4 namelist templates for global and regional experiments and their nests under the parm/forecast directories. Similarly to the templates in the ufs-weather-model regression test suite, these templates use the @[] string to fill in values from environment variables with atparse at runtime. This instance of atparse is kept as a copy of the (assumed) original and is named ush/hafs_atparse.sh in the HAFS repository.
It is unclear whether ush/hafs/namelist.py is used to parse a namelist. It seems that the namelist.py tool recreates some of the functionality of the f90nml Python package, and does not appear to have unit tests.
The Global Workflow manages the FV3 namelist natively in bash. The contents of the namelist are defined as a string in ush/parsing_namelists_FV3.sh and values (along with defaults) are set via bash environment variables. Groups of key/value pairs (not necessarily limited to fortran namelist groups) are set based on workflow-level flags (e.g., IAU, coupled chemistry, stochastic physics, etc.) to build out a full namelist for the specific configuration of the GFS or GDAS. The resulting string is piped to a namelist file in the appropriate working directory. Defaults are set via the standard bash methods of setting defaults for when specific environment variables have not been set.
The SRW App manages namelists in the external repository regional_workflow using a standalone Python tool called from bash, ush/set_namelist.py, that leverages the f90nml package along with pyYAML to ingest a default namelist as a dictionary data structure, ush/templates/input.nml.FV3 and applies changes to the dictionary as they are described in the YAML file, ush/templates/FV3.input.yml. Each of the YAML sections is named for a set of physics to be used in the configuration. The YAML tracks only changes to the default, reducing required duplicated information common to all or many SRW configurations.
The set_namelist.py tool also provides options to ingest two namelists (ordered or not) and output the differences between them, as well as ingest a YAML and create a namelist from it.
- The tool must have the ability to treat an input file as a flat template file to fill in templated fields from environment variables.
- The tool must have the ability to treat an input file as a configuration file and fill in templated fields from environment variables or configuration settings.
- The tool must be callable from bash scripts.
- The tool must be sufficiently fast and efficient as to not add significant computational overhead to the scripts employing it.
- The necessary 3rd party packages must be supported on Tier 1 platforms as part of spack stack.
This stage essentially just replaces the use of the atparse utility in the regression test suite with the equivalent Python + Jinja2 tool that parses flat files and substitutes values in the necessary strings. This can be applied to all types of files that are currently templated with atparse .
This step does not provide a significant "value add" as it only provides a Python avenue to do the same thing that was previously handled by atparse . It will, however, provide a gateway to the Python tools that come in future feature enhancements of a namelist managing tool and strategy.
This approach is the least disruptive in terms of transitioning to a Python-based tool for templating in the model regression test suite. Given that none of the existing users are coupled to the namelists and configuration files in the regression test framework of the model, this stage should not provide any disruptions to model users.
This extension to a namelist management tool will enable a user (e.g., one of the UFS Apps) to read in a namelist stored in the regression test suite and either fill in the Jinja templates the same way they are filled in by the regression tests, AND/OR overwrite any fields that exist with values provided by their specific configuration. The additional Python package f90nml enables Python to parse a namelist file as a Python dictionary data structure. When namelists are ingested as data-structures, users have a more flexible way to edit any item in a namelist. This freedom comes with caveats – the flexibility associated with adding key/value pairs to a namelist raises the potential for namelists to be incompatible with model.
Users can directly couple with namelists managed in the regression suite to benefit from using tested configurations, while maintaining independence from adopting the variables necessary in the templated variables adopted by the regression suite, or the method by which it's filed in. For example, in the model namelist template, all values are filled in from capitalized environment variables of the same name:
&atmos_model_nml
blocksize = 32
chksum_debug = .false.
dycore_only = .false.
ccpp_suite = '{{ CCPP_SUITE }}'
/
The user can either choose to set environment variables with this name (e.g., export CCPP_SUITE='FV3_GFS_v16'
), or can provide a dictionary-based configuration data structure matching the name of the key to be updated. Here's an example of a YAML file that would allow a user to fill in any settings that they may want to change given a "templated" namelist configuration file.
atmos_model_nml:
ccpp_suite: 'FV3_GFS_v16'
blocksize: 48
Which should result in the following namelist section as intended:
&atmos_model_nml
blocksize = 48
chksum_debug = .false.
dycore_only = .false.
ccpp_suite = 'FV3_GFS_v16'
/
A similar argument can be made for any type of data file that has a parser relating its contents to a data structure in Python...YAML (and YAML like), JSON, Fortran, INI, etc.
This tool could easily be fitted with a capability to compare two namelists to provide an accurate account of differences. This is extremely useful when questions arise like "your namelist works, why doesn't mine?" and to address the difficulty that can come with comparing namelists between users who use inconsistent ordering of namelist entries.
- While this method provides a more flexible way to couple configuration files with those tested in the regression suite, any modifications that are not tested are prone to failing the internal validation checks when the model is run.
- Any key/value pairs kept in the users' spaces are subject to the same code maintenance overhead currently required of maintaining full namelists – they must check that entries remain consistent with those that are tested as part of the model regression test suite. These tests, however, become much cheaper when the user can write tests that compare the keys/values generated by UFS App scripts are consistent with those included in tested namelists. Reports like "your namelist contains 3 keys not contained in the regression test" could help code managers drill down to the source of problems earlier and with less frustration.
This is the toughest of all. Each section of the model namelist is ingested in disparate portions of the model, from the dycore to the various physics packages and beyond. The namelist modules that read the namelist sections may give insight about the names of expected variables, or variable type, but the range checking and logic associated with each entry is scattered throughout the code, and can be triggered as a function of compile time or run time switches. There seem to be two schools of thought on approaching this problem, and the stakeholder community has much discussion ahead about the eventual outcome of validating namelists in a standard way. The first hinges on collecting all of this information (variable names, types, and value bounds) into a single database of sorts outside the model, duplicating a LOT of information – a serious violation of the basic DRY (Don't Repeat Yourself) coding principle. The second school of thought is to somehow gather the information necessary from the the source code itself to be able to separate model validation from the execution of the model, a path that will likely take a huge refactoring of many of the constituent modules under the ufs-weather-model umbrella. Given that one of these two methods could be viable in terms of defining the rules and regulations required of each entry in a namelist in the scope of Python memory, we would have a very valuable mechanism by which to validate namelists (and potentially several other types of configuration files) without the use of HPC resources.
If we template everything in the configuration of an experiment, that just kicks the can to a different type of file where we keep the information about parameters that define an experiment. Where and how do we manage the sections of the configure files certain subsets of users care about. For example, settings for HAFS that enable multiple moving nests are not necessary to the generation of configuration files for Global or SRW workflows.
Component Configuration Management UFS Weather Model Discussion Page