Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add minKLIFSAI data #74

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

phalem
Copy link
Contributor

@phalem phalem commented Oct 16, 2024

I collect this data from PubChem on Jan 2023.

Summary by Sourcery

New Features:

  • Added minKLIFSAI dataset to the README, which includes 18.8 million datapoints for 452 kinases, representing 1.2 million unique compounds, with 300k active and 900k inactive compounds collected from PubChem.

Copy link

sourcery-ai bot commented Oct 16, 2024

Reviewer's Guide by Sourcery

This pull request adds information about a new dataset called minKLIFSAI to the README.md file. The dataset contains a large number of datapoints for kinase compounds, collected from PubChem in January 2023.

No diagrams generated as the changes look simple and do not need a visual representation.

File-Level Changes

Change Details Files
Addition of minKLIFSAI dataset information
  • Adds a new entry for the minKLIFSAI dataset
  • Includes a brief description of the dataset's contents
  • Provides a direct download link to the dataset file on Zenodo
README.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @phalem - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Thank you for adding the minKLIFSAI dataset. To maintain consistency with other entries, consider reformatting the description to be more concise and align with the style of surrounding entries. Also, please check for typos and use a landing page URL instead of a direct file download link if possible.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟡 Documentation: 2 issues found

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@@ -62,6 +62,8 @@ Contributions are very welcome - please follow the [guidelines](CONTRIBUTING.md)
- [MoleculeNet](https://moleculenet.org/datasets-1) - Benchmark suite that contains multiple datasets listed here
- [oechem](https://ochem.eu/): On Feb 17 2023 OCHEM contained 3774118 records for 689 properties (with at least 50 records) collected from 20609 sources (user is granted a Creative Commons CC-BY (version 4.0) license to data submitted)
- [Papyrus](https://doi.org/10.4121/16896406.v3): A large scale curated dataset aimed at bioactivity predictions. Contains multiple large publicly available datasets such as ChEMBL and ExCAPE-DB combined with smaller datasets.
- [minKLIFSAI](https://zenodo.org/records/13370507/files/enriched_target_valid.csv.gz?download=1): 18.8 million datapoint for 452 kinase, resemble 1.2 million unique compounds, consisting of 300k active and 900K inactive
compounds collected from PubChem(Jan 2023).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (documentation): Add a space after 'PubChem' for better readability.

Suggested change
compounds collected from PubChem(Jan 2023).
compounds collected from PubChem (Jan 2023).

@@ -62,6 +62,8 @@ Contributions are very welcome - please follow the [guidelines](CONTRIBUTING.md)
- [MoleculeNet](https://moleculenet.org/datasets-1) - Benchmark suite that contains multiple datasets listed here
- [oechem](https://ochem.eu/): On Feb 17 2023 OCHEM contained 3774118 records for 689 properties (with at least 50 records) collected from 20609 sources (user is granted a Creative Commons CC-BY (version 4.0) license to data submitted)
- [Papyrus](https://doi.org/10.4121/16896406.v3): A large scale curated dataset aimed at bioactivity predictions. Contains multiple large publicly available datasets such as ChEMBL and ExCAPE-DB combined with smaller datasets.
- [minKLIFSAI](https://zenodo.org/records/13370507/files/enriched_target_valid.csv.gz?download=1): 18.8 million datapoint for 452 kinase, resemble 1.2 million unique compounds, consisting of 300k active and 900K inactive
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (documentation): Standardize number formatting throughout the description.

Consider using either million/thousand or K consistently (e.g., '18.8 million', '300K', and '900K' or '18.8 million', '300 thousand', and '900 thousand').

Suggested change
- [minKLIFSAI](https://zenodo.org/records/13370507/files/enriched_target_valid.csv.gz?download=1): 18.8 million datapoint for 452 kinase, resemble 1.2 million unique compounds, consisting of 300k active and 900K inactive
[minKLIFSAI](https://zenodo.org/records/13370507/files/enriched_target_valid.csv.gz?download=1): 18.8 million datapoints for 452 kinases, resembling 1.2 million unique compounds, consisting of 300 thousand active and 900 thousand inactive compounds collected from PubChem (Jan 2023).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant