-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add minKLIFSAI data #74
base: main
Are you sure you want to change the base?
Conversation
Reviewer's Guide by SourceryThis pull request adds information about a new dataset called minKLIFSAI to the README.md file. The dataset contains a large number of datapoints for kinase compounds, collected from PubChem in January 2023. No diagrams generated as the changes look simple and do not need a visual representation. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @phalem - I've reviewed your changes - here's some feedback:
Overall Comments:
- Thank you for adding the minKLIFSAI dataset. To maintain consistency with other entries, consider reformatting the description to be more concise and align with the style of surrounding entries. Also, please check for typos and use a landing page URL instead of a direct file download link if possible.
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟡 Documentation: 2 issues found
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
@@ -62,6 +62,8 @@ Contributions are very welcome - please follow the [guidelines](CONTRIBUTING.md) | |||
- [MoleculeNet](https://moleculenet.org/datasets-1) - Benchmark suite that contains multiple datasets listed here | |||
- [oechem](https://ochem.eu/): On Feb 17 2023 OCHEM contained 3774118 records for 689 properties (with at least 50 records) collected from 20609 sources (user is granted a Creative Commons CC-BY (version 4.0) license to data submitted) | |||
- [Papyrus](https://doi.org/10.4121/16896406.v3): A large scale curated dataset aimed at bioactivity predictions. Contains multiple large publicly available datasets such as ChEMBL and ExCAPE-DB combined with smaller datasets. | |||
- [minKLIFSAI](https://zenodo.org/records/13370507/files/enriched_target_valid.csv.gz?download=1): 18.8 million datapoint for 452 kinase, resemble 1.2 million unique compounds, consisting of 300k active and 900K inactive | |||
compounds collected from PubChem(Jan 2023). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (documentation): Add a space after 'PubChem' for better readability.
compounds collected from PubChem(Jan 2023). | |
compounds collected from PubChem (Jan 2023). |
@@ -62,6 +62,8 @@ Contributions are very welcome - please follow the [guidelines](CONTRIBUTING.md) | |||
- [MoleculeNet](https://moleculenet.org/datasets-1) - Benchmark suite that contains multiple datasets listed here | |||
- [oechem](https://ochem.eu/): On Feb 17 2023 OCHEM contained 3774118 records for 689 properties (with at least 50 records) collected from 20609 sources (user is granted a Creative Commons CC-BY (version 4.0) license to data submitted) | |||
- [Papyrus](https://doi.org/10.4121/16896406.v3): A large scale curated dataset aimed at bioactivity predictions. Contains multiple large publicly available datasets such as ChEMBL and ExCAPE-DB combined with smaller datasets. | |||
- [minKLIFSAI](https://zenodo.org/records/13370507/files/enriched_target_valid.csv.gz?download=1): 18.8 million datapoint for 452 kinase, resemble 1.2 million unique compounds, consisting of 300k active and 900K inactive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (documentation): Standardize number formatting throughout the description.
Consider using either million/thousand or K consistently (e.g., '18.8 million', '300K', and '900K' or '18.8 million', '300 thousand', and '900 thousand').
- [minKLIFSAI](https://zenodo.org/records/13370507/files/enriched_target_valid.csv.gz?download=1): 18.8 million datapoint for 452 kinase, resemble 1.2 million unique compounds, consisting of 300k active and 900K inactive | |
[minKLIFSAI](https://zenodo.org/records/13370507/files/enriched_target_valid.csv.gz?download=1): 18.8 million datapoints for 452 kinases, resembling 1.2 million unique compounds, consisting of 300 thousand active and 900 thousand inactive compounds collected from PubChem (Jan 2023). |
I collect this data from PubChem on Jan 2023.
Summary by Sourcery
New Features: