Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processing of QM7/QM9 Targets #42

Open
FelixKatz77 opened this issue Nov 16, 2021 · 4 comments
Open

Processing of QM7/QM9 Targets #42

FelixKatz77 opened this issue Nov 16, 2021 · 4 comments

Comments

@FelixKatz77
Copy link

Hi,
I wanted to use the splits of the QM7 and QM8 datasets for benchmarking when I noticed a discrepancy between the targets accessible via the load_qm7()/load_qm8() functions and the original targets of these datasets (http://quantum-machine.org/datasets/). I could not find any information on any processing of the targets. Could you clarify if any normalisation or rescaling was done?

I was also wondering how the benchmark performance was determined in the case of multitask datasets. In these cases, was a single task taken into account or the performance on all tasks? Thanks!

@rbharath
Copy link
Member

I believe that outputs are normalized (see https://deepchem.readthedocs.io/en/latest/api_reference/moleculenet.html#qm7-datasets, and linked source). The discrepancy between the load functions and the original datasets is a little disconcerting and something we should investigate

For benchmark performance, I believe it is mean performance across all tasks but I'm going from memory and may be wrong

@FelixKatz77
Copy link
Author

I think the target processing is relevant to all the regression tasks. I tried to figure out the mapping between the targets in the datasets downloaded from https://moleculenet.org/datasets-1 and the targets you can access via the 'y' label after loading the datasets via dc.molnet.load_dataset() but could not figure it out. Would be great if you could comment on this.

@FelixKatz77
Copy link
Author

I figured out the normalization using the 'transformers' argument in dc.molnet.load_dataset().

@FelixKatz77
Copy link
Author

If get any more insights on the benchmarking for multitask datasets I would still be happy to learn about this.
Thanks!

@FelixKatz77 FelixKatz77 reopened this Nov 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants