Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What Can We Do Better? #1

Open
lilleswing opened this issue Feb 24, 2020 · 2 comments
Open

What Can We Do Better? #1

lilleswing opened this issue Feb 24, 2020 · 2 comments

Comments

@lilleswing
Copy link
Member

The Moleculenet publication has accomplished much in terms of having standardized problems for supervised learning over chemical structures. However over the past couple of years we have seen some barriers to entry in using the datasets. How can we make it easier?

This issue can be a brainstorming page for how to make the MoleculeNet datasets more accessible to Machine Learning Practitioners.

@rbharath
Copy link
Member

Here's a couple of my observations so far:

  • There's a lot of scope to extend MoleculeNet into materials science applications. There's a lot of interest in fields such as electrolyte design for batteries where better benchmarks could help
  • There are a lot of new protein-ligand binding datasets available now that cryo-EM data is more available. We should expand out the collection of protein-ligand datasets
  • Perhaps new crystal structure datasets?

@rbharath
Copy link
Member

We should make sure that there's a stable mechanism for splitting datasets that allows for easy benchmarking. This repo has some code that improves the stability (which was an issue in the original MoleculeNet):

https://github.com/shenwanxiang/ChemBench

rbharath pushed a commit that referenced this issue Jan 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants