Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add minKLIFSAI data #74

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@ Contributions are very welcome - please follow the [guidelines](CONTRIBUTING.md)
- [MoleculeNet](https://moleculenet.org/datasets-1) - Benchmark suite that contains multiple datasets listed here
- [oechem](https://ochem.eu/): On Feb 17 2023 OCHEM contained 3774118 records for 689 properties (with at least 50 records) collected from 20609 sources (user is granted a Creative Commons CC-BY (version 4.0) license to data submitted)
- [Papyrus](https://doi.org/10.4121/16896406.v3): A large scale curated dataset aimed at bioactivity predictions. Contains multiple large publicly available datasets such as ChEMBL and ExCAPE-DB combined with smaller datasets.
- [minKLIFSAI](https://zenodo.org/records/13370507/files/enriched_target_valid.csv.gz?download=1): 18.8 million datapoint for 452 kinase, resemble 1.2 million unique compounds, consisting of 300k active and 900K inactive
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (documentation): Standardize number formatting throughout the description.

Consider using either million/thousand or K consistently (e.g., '18.8 million', '300K', and '900K' or '18.8 million', '300 thousand', and '900 thousand').

Suggested change
- [minKLIFSAI](https://zenodo.org/records/13370507/files/enriched_target_valid.csv.gz?download=1): 18.8 million datapoint for 452 kinase, resemble 1.2 million unique compounds, consisting of 300k active and 900K inactive
[minKLIFSAI](https://zenodo.org/records/13370507/files/enriched_target_valid.csv.gz?download=1): 18.8 million datapoints for 452 kinases, resembling 1.2 million unique compounds, consisting of 300 thousand active and 900 thousand inactive compounds collected from PubChem (Jan 2023).

compounds collected from PubChem(Jan 2023).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (documentation): Add a space after 'PubChem' for better readability.

Suggested change
compounds collected from PubChem(Jan 2023).
compounds collected from PubChem (Jan 2023).

- [Photoswitch Dataset](https://github.com/Ryan-Rhys/The-Photoswitch-Dataset): Curated dataset of 405 photoswitch molecules.
- [QM Datasets](http://quantum-machine.org/datasets/): QM7, QM7b, QM8, QM9, MD Trajectories
- [SolProp](https://discord.com/channels/850068776544108564/1074753729955381298/1076099689184772116): Database of 1 million solvent/solute COSMO-RS calculations and 10145 experimental solvation free energies (originally published as part of [this paper](https://arxiv.org/abs/2012.11730)).
Expand Down