Skip to content

Latest commit

 

History

History
53 lines (44 loc) · 2.77 KB

README.md

File metadata and controls

53 lines (44 loc) · 2.77 KB

Code style: black GitHub GitHub release (latest by date)

Reaction Transform descriptors

Python code to calculate reaction transform descriptors as described in CHEMRXIV, by @DocMinus and @DrAlatriste.

Installation

See environment folder. Updated the installation with a setup file to enable the tools to be part of ones Python environment. Testing has also been added.

Example Usage

Run the example script by providing a file with tab/semicolon separated data (also comma or space, though not recommended):

python AB2C_reaction_TDs_example.py inputfilename

You can get help by calling the script using -h: `python AB2C_reaction_TDs_example.py -h`

This particular script expects the input order of the file as

ID reactant1 reactant2 product

Simple cleaning of structures is included; "extreme" broken structures might not get fixed with the provided method.
Two small test-sets are provided with made up reactions, one of them containing a "faulty" structure to demonstrate correct filtration in the output result.
Execute via: python AB2C_reaction_TDs_examples.py ./datsets/testreactions.tsv

Syntax

If you only want to use the TD function, your script requires the following minimum lines with the smiles as string tuples (even if only a single reaction):

from td_tools.rxntools import transform_descriptors

output_table = transform_descriptors(['smiles_reactant1'],['smiles_reactant2'],['product'])

A cleaning function as well as a file reader function is included for larger datasets:

from td_tools.rxntools import clean_smiles_multi, read_rct2pd  

The provided script includes examples on how to concatenate the structures versus the TDs.

Testing

Python testing has been added instead of the previous test.py, see the README.md under /tests.

Acknowledgments

We would like to thank @eryl for suggestions and help regarding multiprocessing in the original build. This allowed processing of large datasets within minutes or even seconds on a standard system, versus previously hours.
Currently this has been changed to joblib instead, which seems a bit more stable and faster in this particular context.

Updates

  • setup.py for install as package
  • testing added
  • switch from multiparallel to joblib
  • releases introduced; version number reflects version number of tool.