Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some molecules in industry benchmark set have invalid CMILES #327

Open
j-wags opened this issue Aug 16, 2023 · 1 comment
Open

Some molecules in industry benchmark set have invalid CMILES #327

j-wags opened this issue Aug 16, 2023 · 1 comment

Comments

@j-wags
Copy link
Member

j-wags commented Aug 16, 2023

cc #207
cc openforcefield/openff-qcsubmit#228
cc openforcefield/openff-toolkit#1696

The pattern [NH+:1] shouldn't be valid in CMILES, since it doesn't identify which map value the attached hydrogen gets. However, a few entries with this problem seem to have snuck into the industry benchmark set.

from qcportal import FractalClient
client = FractalClient()

col = client.get_collection(collection_type="OptimizationDataset", name = "OpenFF Industry Benchmark Season 1 v1.1")
bad = [i for i,j in col.data.records.items() if "NH+" in j.attributes["canonical_isomeric_explicit_hydrogen_mapped_smiles"]]

An example CMILES with this problem is [F:1][c:2]1[c:3]([H:32])[c:4]([H:33])[c:5]([H:34])[c:6]([F:7])[c:8]1[C:9]1=[N:12][N:13]2[C:14](=[C:15]([H:37])[N:16]=[C:17]2[N:18]([c:19]2[c:20]([H:39])[nH+:21][c:22]([H:40])[c:23]([H:41])[c:24]2[N:25]2[C:26]([H:42])([H:43])[C@:30]([NH+:31]([H:51])[H:52])([H:50])[C:29]([H:48])([H:49])[C:28]([H:46])([H:47])[C:27]2([H:44])[H:45])[H:38])[C:11]([H:36])=[C:10]1[H:35]

@mattwthompson
Copy link
Member

So it isn't lost to the sands of time - I think there was a dump of this data produced at some point and modified to fix these CMILES. And instead of pulling down the entire dataset from QCArchive, that JSON is used as a starting point for analysis, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants