Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/data augmentation #52

Draft
wants to merge 36 commits into
base: dev
Choose a base branch
from
Draft

Feature/data augmentation #52

wants to merge 36 commits into from

Conversation

vidvath7
Copy link
Collaborator

@vidvath7 vidvath7 commented Sep 20, 2024

chebi_augmentation.yml

aug_data:
Set this parameter to True to enable the generation of an augmented dataset. When set to False, only the regular dataset will be generated.

augment_data_batch_size:
Configure this parameter to define the batch size for processing the augmented data.pkl file.

num_smiles_variations :
Use this parameter to specify the maximum number of SMILES variations to generate for each compound. This helps control the diversity of the generated dataset.

chebi.py:

AugmentedDataExtractor Class
This class inherits from the main ChEBI data extractor to specialize in generating augmented datasets. It supports custom configurations such as batch size and number of SMILES variations.

augment_data():
Verifies the existence of the original data.pkl file and, if found, generates and saves the augmented data.pkl file in the specified augmented directory.

generate_smiles_variations():
Produces SMILES variations based on different configurations like rooted atoms and randomization.

setup_processed():
Prepares processed data for the augmented ChEBI dataset by transforming and saving it in the required format.

@sfluegel05 sfluegel05 marked this pull request as draft September 20, 2024 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant