add minKLIFSAI data #74

phalem · 2024-10-16T00:47:37Z

I collect this data from PubChem on Jan 2023.

Summary by Sourcery

New Features:

Added minKLIFSAI dataset to the README, which includes 18.8 million datapoints for 452 kinases, representing 1.2 million unique compounds, with 300k active and 900k inactive compounds collected from PubChem.

sourcery-ai · 2024-10-16T00:47:42Z

Reviewer's Guide by Sourcery

This pull request adds information about a new dataset called minKLIFSAI to the README.md file. The dataset contains a large number of datapoints for kinase compounds, collected from PubChem in January 2023.

No diagrams generated as the changes look simple and do not need a visual representation.

File-Level Changes

Change	Details	Files
Addition of minKLIFSAI dataset information	Adds a new entry for the minKLIFSAI dataset Includes a brief description of the dataset's contents Provides a direct download link to the dataset file on Zenodo	`README.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time. You can also use
this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @phalem - I've reviewed your changes - here's some feedback:

Overall Comments:

Thank you for adding the minKLIFSAI dataset. To maintain consistency with other entries, consider reformatting the description to be more concise and align with the style of surrounding entries. Also, please check for typos and use a landing page URL instead of a direct file download link if possible.

Here's what I looked at during the review

🟢 General issues: all looks good
🟢 Security: all looks good
🟢 Testing: all looks good
🟢 Complexity: all looks good
🟡 Documentation: 2 issues found

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2024-10-16T00:49:01Z

README.md

@@ -62,6 +62,8 @@ Contributions are very welcome - please follow the [guidelines](CONTRIBUTING.md)
 - [MoleculeNet](https://moleculenet.org/datasets-1) - Benchmark suite that contains multiple datasets listed here
 - [oechem](https://ochem.eu/): On Feb 17 2023 OCHEM contained 3774118 records for 689 properties (with at least 50 records) collected from 20609 sources (user is granted a Creative Commons CC-BY (version 4.0) license to data submitted)
 - [Papyrus](https://doi.org/10.4121/16896406.v3): A large scale curated dataset aimed at bioactivity predictions. Contains multiple large publicly available datasets such as ChEMBL and ExCAPE-DB combined with smaller datasets.
+- [minKLIFSAI](https://zenodo.org/records/13370507/files/enriched_target_valid.csv.gz?download=1): 18.8 million datapoint for 452 kinase, resemble 1.2 million unique compounds, consisting of 300k active and 900K inactive
+compounds collected from PubChem(Jan 2023).


suggestion (documentation): Add a space after 'PubChem' for better readability.

Suggested change

compounds collected from PubChem(Jan 2023).

compounds collected from PubChem (Jan 2023).

sourcery-ai · 2024-10-16T00:49:01Z

README.md

@@ -62,6 +62,8 @@ Contributions are very welcome - please follow the [guidelines](CONTRIBUTING.md)
 - [MoleculeNet](https://moleculenet.org/datasets-1) - Benchmark suite that contains multiple datasets listed here
 - [oechem](https://ochem.eu/): On Feb 17 2023 OCHEM contained 3774118 records for 689 properties (with at least 50 records) collected from 20609 sources (user is granted a Creative Commons CC-BY (version 4.0) license to data submitted)
 - [Papyrus](https://doi.org/10.4121/16896406.v3): A large scale curated dataset aimed at bioactivity predictions. Contains multiple large publicly available datasets such as ChEMBL and ExCAPE-DB combined with smaller datasets.
+- [minKLIFSAI](https://zenodo.org/records/13370507/files/enriched_target_valid.csv.gz?download=1): 18.8 million datapoint for 452 kinase, resemble 1.2 million unique compounds, consisting of 300k active and 900K inactive


suggestion (documentation): Standardize number formatting throughout the description.

Consider using either million/thousand or K consistently (e.g., '18.8 million', '300K', and '900K' or '18.8 million', '300 thousand', and '900 thousand').

Suggested change

- [minKLIFSAI](https://zenodo.org/records/13370507/files/enriched_target_valid.csv.gz?download=1): 18.8 million datapoint for 452 kinase, resemble 1.2 million unique compounds, consisting of 300k active and 900K inactive

[minKLIFSAI](https://zenodo.org/records/13370507/files/enriched_target_valid.csv.gz?download=1): 18.8 million datapoints for 452 kinases, resembling 1.2 million unique compounds, consisting of 300 thousand active and 900 thousand inactive compounds collected from PubChem (Jan 2023).

add minKLIFSAI data

656108d

sourcery-ai bot reviewed Oct 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add minKLIFSAI data #74

add minKLIFSAI data #74

phalem commented Oct 16, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Oct 16, 2024 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

sourcery-ai bot Oct 16, 2024

sourcery-ai bot Oct 16, 2024

	compounds collected from PubChem(Jan 2023).
	compounds collected from PubChem (Jan 2023).

	- [minKLIFSAI](https://zenodo.org/records/13370507/files/enriched_target_valid.csv.gz?download=1): 18.8 million datapoint for 452 kinase, resemble 1.2 million unique compounds, consisting of 300k active and 900K inactive
	[minKLIFSAI](https://zenodo.org/records/13370507/files/enriched_target_valid.csv.gz?download=1): 18.8 million datapoints for 452 kinases, resembling 1.2 million unique compounds, consisting of 300 thousand active and 900 thousand inactive compounds collected from PubChem (Jan 2023).

add minKLIFSAI data #74

Are you sure you want to change the base?

add minKLIFSAI data #74

Conversation

phalem commented Oct 16, 2024 • edited by sourcery-ai bot Loading

Summary by Sourcery

sourcery-ai bot commented Oct 16, 2024 • edited Loading

Reviewer's Guide by Sourcery

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

Choose a reason for hiding this comment

sourcery-ai bot Oct 16, 2024

Choose a reason for hiding this comment

sourcery-ai bot Oct 16, 2024

Choose a reason for hiding this comment

phalem commented Oct 16, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Oct 16, 2024 •

edited

Loading