Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Processing Error in the Original QM8 Dataset on Some Tasks #41

Open
rbharath opened this issue Nov 15, 2021 · 0 comments
Open

Comments

@rbharath
Copy link
Member

There is a potential error in the QM8 dataset from the original MoleculeNet paper caused by duplicate columns (possibly due to a pandas data processing error).

deepchem/deepchem#2747

We are still working to verify the error but in the meanwhile there is a fix PR under review that you can use:

deepchem/deepchem#2756

Assuming the error is indeed present, the benchmarking numbers for QM8 may need to be rerun. The duplicated columns are for two very similar tasks though (the two tasks are to predict DFT results on the same molecule computed with the same functional but different basis sets) so I suspect that the qualitative changes will be relatively minimal (models have in effect been double predicting one DFT run instead of two slightly different DFT runs)

@rbharath rbharath changed the title Potential Porcessing Error in the Original QM8 Dataset on Some Tasks Potential Processing Error in the Original QM8 Dataset on Some Tasks Nov 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant