Hey there 👋! Look at you wanting to contribute! This package is designed to make it as easy as possible for you to contribute new parsers for quantum chemistry programs and to make it as easy as possible to maintain the code. This document will walk you through the design decisions and how to add new parsers.
-
Create a file in the
parsers
directory named after the quantum chemistry program, e.g.,terachem.py
. -
Create a
SUPPORTED_FILETYPES
set in the module containing the file types the parsers support. -
If
stdout
is a file type then create adef get_calctype(string: str) -> CalcType
function that returns theCalcType
for the file. One ofCalcType.energy
,CalcType.gradient
, orCalcType.hessian
. -
Create simple parser functions that accept file data (
str | bytes
) and adata_collector
object. The parser should 1) parse a single piece of data from the file, 2) cast it to the correct Python type and 3) set it on the output object at its corresponding location found on theqcio.SinglePointResults
object. Register this parser by decorating it with the@parser()
decorator. The decorator optionally accepts afiletype
argument (FileType.stdout
by default) and can declare keyword argumentsrequired
(True
by default), andonly
(None
by default). See theqcparse.utils.parser
decorator for details on what these mean.@parser(filetype=FileType.stdout) def parse_some_data(string: str, data_collector: ParsedDataCollector): """Parse some data from a file.""" regex = r"Some Data: (-?\d+(?:\.\d+)?)" data_collector.some_data = float(regex_search(regex, string).group(1))
-
That's it! The developer just has to focus on writing simple parser functions like this and the
qcparse
package will take care of registering these parsers for the correct program and filetype and will call them at the right time when parsing a file.
See the terachem.py
file for an overview.
- Top level
parse
function is called passingdata_or_path: Union[Path, str, bytes]
, theprogram: str
that generated the output, and thefiletype
(e.g.,stdout
orwavefunction
or whatever filetypes a particular program emits for which parsers have been written). parse
instantiates anParsedDataCollector
object that acts as a proxy for theSinglePointResults
object but offers two advantages:- The
SinglePointResults
object has multiple required data fields, but parsers only return a single data value per parser. TheParsedDataCollector
object gets passed to parsers and they can add their parsed value to the objects just as if it were a mutableSinglePointResults
object. This makes it easy for each parser to both specify exactly what data they parse and where that data will live on the final structured object. - The
ParsedDataCollector
object only allows setting a particular data attribute once. If a second attempt is made it raises anAttributeError
. This provides a sanity check that multiple parsers aren't trying to write to the same field and overwriting each other.
- The
parse
looks up the parsers for theprogram
in theparser_registry
. Parsers are registered by wrapping them with the@parser
decorator found inqcparse.parsers.utils
. The@parser
decorator registers a parser with the registry under the program name of the module in which it is found, verifying that thefiletype
for which it is registered is supported by theprogram
by checkingSupportedFileTypes
in the parser's module. It also registers whether a parsermust_succeed
which means an exception will be raised if this value is not found when attempting to parse a file. In order for parsers to properly register they must be imported, so make sure they are hoisted into theqcparse.parsers.__init__
file.parse
executes all parsers for the givenfiletype
and converts theParsedDataCollector
object passed to all the parsers into a finalSinglePointResults
object.
With all code merged to master
and the latest code pulled down to your local machine, run:
python scripts/release.py x.x.x