CoPriNet

CoPriNet is a Graph Neural Network trained on pairs of molecule 2D graphs and catalogue prices. CoPriNet predictions can be used as a proxy score for compound availability.

Installation

In order to install requirements, use conda to create an environment form the CoPriNet_env.yml file

conda env create -f CoPriNet_env.yml 
conda activate CoPriNet

Scoring molecules

In order to execute CoPriNet you only need to prepare a csv file with your SMILES and execute the following command

python -m pricePrediction.predict path/to/csvFile -o path/toResults 

E.g.

~/CoPriNet$ python -m pricePrediction.predict data/testData/npnp_dataset.csv -o npnp_scored.csv

For a complete description of the available options use:

python -m pricePrediction.predict -h

Retraining the model

Preparing dataset

If you are using your own dataset, and not the Mcule catalogue, go to step 2

Split the Mcule catalogue file into train, test, and val .csv files and chunk them into smaller files. The raw Mcule catalogue file should contain, at least, the following columns:"Mcule ID,SMILES,price 1 (USD),amount 1 (mg),delivery time 1 (w.days),available amount 1 (mg)".

python -m  pricePrediction.selectFromRawData.selectFromRawDataNonVirtualAll --full_dataset_fname path/to/Mcule/dataset.csv --computed_datadir ./prepared_partions

e.g.

~/CoPriNet$ python -m  pricePrediction.selectFromRawData.selectFromRawDataNonVirtualAll --full_dataset_fname mcule_purchasable_full_prices_210616_O9Jw1D_only_READILY.csv.gz --computed_datadir ./prepared_partions
~/CoPriNet$ ls prepared_partions/
mcule_full_test.csv_split_0.csv    mcule_full_train.csv_split_10.csv  mcule_full_train.csv_split_22.csv  mcule_full_train.csv_split_34.csv
mcule_full_test.csv_split_1.csv    mcule_full_train.csv_split_11.csv  mcule_full_train.csv_split_23.csv  mcule_full_train.csv_split_35.csv
mcule_full_train.csv_split_00.csv  mcule_full_train.csv_split_12.csv  mcule_full_train.csv_split_24.csv  mcule_full_train.csv_split_36.csv
mcule_full_train.csv_split_01.csv  mcule_full_train.csv_split_13.csv  mcule_full_train.csv_split_25.csv  mcule_full_train.csv_split_37.csv
mcule_full_train.csv_split_02.csv  mcule_full_train.csv_split_14.csv  mcule_full_train.csv_split_26.csv  mcule_full_train.csv_split_38.csv
mcule_full_train.csv_split_03.csv  mcule_full_train.csv_split_15.csv  mcule_full_train.csv_split_27.csv  mcule_full_train.csv_split_39.csv
mcule_full_train.csv_split_04.csv  mcule_full_train.csv_split_16.csv  mcule_full_train.csv_split_28.csv  mcule_full_train.csv_split_40.csv
mcule_full_train.csv_split_05.csv  mcule_full_train.csv_split_17.csv  mcule_full_train.csv_split_29.csv  mcule_full_val.csv_split_0.csv
mcule_full_train.csv_split_06.csv  mcule_full_train.csv_split_18.csv  mcule_full_train.csv_split_30.csv  mcule_full_val.csv_split_1.csv
mcule_full_train.csv_split_07.csv  mcule_full_train.csv_split_19.csv  mcule_full_train.csv_split_31.csv  params.txt
mcule_full_train.csv_split_08.csv  mcule_full_train.csv_split_20.csv  mcule_full_train.csv_split_32.csv
mcule_full_train.csv_split_09.csv  mcule_full_train.csv_split_21.csv  mcule_full_train.csv_split_33.csv

Chunked files contain two columns, "SMILES,price" and will be named following the pattern:

RAW_DATA_FILE_SUFFIX = r"mcule_full_(train|test|val)\.csv_split_\w+\.csv$"

The file pattern can be edited changing the pricePrediction/config.py file. If you are using your own catalogue you may want to split it manually into tran/test/validation and chunk the partitions into smaller files. Each chunked file should contain the header SMILES,price and the prices should be in $/g.

Create the dataset from the chunked .csv files that follow the pattern in the variable pricePrediction.config.RAW_DATA_FILE_SUFFIX The files should be csv files with two columns named SMILES,price. price should be in $/g.

Using the default paths included in pricePrediction/config.py:

python -m  pricePrediction.preprocessData.prepareDataMol2Price

Manually specifying the raw data chunks directory and the directory to store the prepared datasets. Set -n N to use N cpus.

python -m  pricePrediction.preprocessData.prepareDataMol2Price -i /path/to/dir/with/csvFiles -o /path/to/save/prepared/data
#e.g.  
python -m  pricePrediction.preprocessData.prepareDataMol2Price -i ./prepared_partions -o ./dataset_encoded/ -n 32

Train the network

python -m pricePrediction.train.trainNet -m "one messege to describe the training" --encodedDir /path/to/save/prepared/data

e.g.
python -m pricePrediction.train.trainNet -m "trial01" --encodedDir ./dataset_encoded/

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
docker		docker
pricePrediction		pricePrediction
.gitignore		.gitignore
CoPriNet_env.yml		CoPriNet_env.yml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoPriNet

Installation

Scoring molecules

Retraining the model

Preparing dataset

About

Releases

Packages

Languages

License

oxpig/CoPriNet

Folders and files

Latest commit

History

Repository files navigation

CoPriNet

Installation

Scoring molecules

Retraining the model

Preparing dataset

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages