Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KekulizeException: Can't kekulize mol. #8

Open
RudrajitDawn opened this issue Apr 27, 2022 · 6 comments
Open

KekulizeException: Can't kekulize mol. #8

RudrajitDawn opened this issue Apr 27, 2022 · 6 comments

Comments

@RudrajitDawn
Copy link

Hi, I am trying to execute the scripts/preprocess.py and getting this error.

"""
Traceback (most recent call last):
File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "", line 28, in tensorize_pair
mol_tree0 = tensorize(smiles_pair[0], assm=False)
File "", line 12, in tensorize
mol_tree = MolTree(smiles)
File "", line 95, in init
cmol = get_clique_mol(self.mol, c)
File "", line 68, in get_clique_mol
smiles = Chem.MolFragmentToSmiles(mol, atoms, kekuleSmiles=True)
rdkit.Chem.rdchem.KekulizeException: Can't kekulize mol. Unkekulized atoms: 1 2 3 4 26
"""

The above exception was the direct cause of the following exception:

KekulizeException Traceback (most recent call last)
in ()
48 data = [line.strip("\r\n ").split()[:2] for line in f]
49
---> 50 all_data = pool.map(tensorize_pair, data)
51 num_splits = len(data) / 10000
52

1 frames
/usr/lib/python3.7/multiprocessing/pool.py in get(self, timeout)
655 return self._value
656 else:
--> 657 raise self._value
658
659 def _set(self, i, obj):

KekulizeException: Can't kekulize mol. Unkekulized atoms: 1 2 3 4 26

@xinyangATK
Copy link

Have you solved it? I have also met this problem and I skip these molecules but failed in another error:

Original Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/lxy/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/ubuntu/anaconda3/envs/lxy/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/ubuntu/anaconda3/envs/lxy/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/mnt/507aa612-08ac-464f-84b1-98511af354c1/LXY/iclr19-graph2graph/fast_jtnn/datautils.py", line 93, in __getitem__
    return tensorize(batch0, self.vocab, assm=False), tensorize(batch1, self.vocab, assm=self.y_assm)
  File "/mnt/507aa612-08ac-464f-84b1-98511af354c1/LXY/iclr19-graph2graph/fast_jtnn/datautils.py", line 109, in tensorize
    set_batch_nodeID(tree_batch, vocab)
  File "/mnt/507aa612-08ac-464f-84b1-98511af354c1/LXY/iclr19-graph2graph/fast_jtnn/datautils.py", line 136, in set_batch_nodeID
    node.wid = vocab.get_index(node.smiles)
  File "/mnt/507aa612-08ac-464f-84b1-98511af354c1/LXY/iclr19-graph2graph/fast_jtnn/mol_tree.py", line 18, in get_index
    return self.vmap[smiles]
KeyError: 'C1=CNN=C1'

Does this KetError mean that I should re-generate vocab of QED dataset?

@RudrajitDawn
Copy link
Author

I made some working version of the code here: graph2graph_molecule

@xinyangATK
Copy link

I made some working version of the code here: graph2graph_molecule

Are you just replace the code

all_data = pool.map(tensorize_pair, data)

with

process_map(tensorize_pair, data, max_workers=16, chunksize=1)

This way indeed works for the former problem, but I still have the problem with KeyError that I mentioned above

@RudrajitDawn
Copy link
Author

You can create a new issue with the new problem. I think the key should be in the "vocab.txt" . So, if it is not there, it may be manually added or the vocabulary can be extracted from list of SMILES strings using "mol_tree.py"

@zkysfls
Copy link

zkysfls commented Sep 26, 2024

Have you solved it? I have also met this problem and I skip these molecules but failed in another error:

Original Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/lxy/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/ubuntu/anaconda3/envs/lxy/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/ubuntu/anaconda3/envs/lxy/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/mnt/507aa612-08ac-464f-84b1-98511af354c1/LXY/iclr19-graph2graph/fast_jtnn/datautils.py", line 93, in __getitem__
    return tensorize(batch0, self.vocab, assm=False), tensorize(batch1, self.vocab, assm=self.y_assm)
  File "/mnt/507aa612-08ac-464f-84b1-98511af354c1/LXY/iclr19-graph2graph/fast_jtnn/datautils.py", line 109, in tensorize
    set_batch_nodeID(tree_batch, vocab)
  File "/mnt/507aa612-08ac-464f-84b1-98511af354c1/LXY/iclr19-graph2graph/fast_jtnn/datautils.py", line 136, in set_batch_nodeID
    node.wid = vocab.get_index(node.smiles)
  File "/mnt/507aa612-08ac-464f-84b1-98511af354c1/LXY/iclr19-graph2graph/fast_jtnn/mol_tree.py", line 18, in get_index
    return self.vmap[smiles]
KeyError: 'C1=CNN=C1'

Does this KetError mean that I should re-generate vocab of QED dataset?

Hi! I want to know have you solved this problem?

@RudrajitDawn
Copy link
Author

I forgot whether I solved it or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants