Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Technical issues in Tox21MolNet + corresponding Tests #53

Open
3 tasks
aditya0by0 opened this issue Sep 20, 2024 · 0 comments · May be fixed by #56
Open
3 tasks

Technical issues in Tox21MolNet + corresponding Tests #53

aditya0by0 opened this issue Sep 20, 2024 · 0 comments · May be fixed by #56
Assignees

Comments

@aditya0by0
Copy link
Collaborator

aditya0by0 commented Sep 20, 2024

Technical issues in Tox21MolNet:

Issue 1 : Missing group Key

I've encountered an issue with the setup_processed method when working with the Tox21MolNet and its data (tox21.csv file). It appears that the file does not include a header or key named "group", which is causing a KeyError in the line:

groups = np.array([d["group"] for d in data])

Additionally, the _load_data_from_file method does not seem to utilize the any Reader to create or handle a "group" key in the data. As a result, the group key does not exist in the dictionaries produced by _load_data_from_file, leading to the observed error.
The _load_data_from_file method only yields three keys: features, labels, and ident:

yield dict(features=smiles, labels=labels, ident=row["mol_id"])

Issue 2: Generator Issue with train_test_split

Another issue arises from the use of a generator in the _load_data_from_file method. The generator object cannot be directly passed to train_test_split, as it expects a collection (e.g., a list or array). This causes the following error:

TypeError: Singleton array array(<generator object Tox21MolNet._load_data_from_file at 0x000001FD068AB1B0>,
      dtype=object) cannot be considered a valid collection.

Solution: To fix this, the generator output should be converted to a list before using it for splitting:

data = list(self._load_data_from_file(os.path.join(self.raw_dir, f"tox21.csv")))

Tests

  • Tox21MolNet:
    • Write unit tests for setup_processed() with mock data.
      • Check if output format is correct (the collator) expects a dict with features, labels, ident keys, features have to be>> able to be converted to a tensor
    • Write unit tests for _load_data_from_file() using mock file operations.
@aditya0by0 aditya0by0 changed the title **KeyError: group** in setup_processed method of Tox21MolNet KeyError: group in setup_processed method of Tox21MolNet Sep 20, 2024
@aditya0by0 aditya0by0 self-assigned this Sep 20, 2024
@aditya0by0 aditya0by0 changed the title KeyError: group in setup_processed method of Tox21MolNet Technical issues in Tox21MolNet Sep 25, 2024
@aditya0by0 aditya0by0 linked a pull request Sep 25, 2024 that will close this issue
@aditya0by0 aditya0by0 linked a pull request Sep 25, 2024 that will close this issue
aditya0by0 added a commit that referenced this issue Oct 20, 2024
- this test will be added in another branch later once #53 is completed
@aditya0by0 aditya0by0 mentioned this issue Oct 20, 2024
31 tasks
@aditya0by0 aditya0by0 changed the title Technical issues in Tox21MolNet Technical issues in Tox21MolNet + corresponding Tests Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant