KekulizeException: Can't kekulize mol. #8

RudrajitDawn · 2022-04-27T13:34:23Z

Hi, I am trying to execute the scripts/preprocess.py and getting this error.

"""
Traceback (most recent call last):
File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "", line 28, in tensorize_pair
mol_tree0 = tensorize(smiles_pair[0], assm=False)
File "", line 12, in tensorize
mol_tree = MolTree(smiles)
File "", line 95, in init
cmol = get_clique_mol(self.mol, c)
File "", line 68, in get_clique_mol
smiles = Chem.MolFragmentToSmiles(mol, atoms, kekuleSmiles=True)
rdkit.Chem.rdchem.KekulizeException: Can't kekulize mol. Unkekulized atoms: 1 2 3 4 26
"""

The above exception was the direct cause of the following exception:

KekulizeException Traceback (most recent call last)
in ()
48 data = [line.strip("\r\n ").split()[:2] for line in f]
49
---> 50 all_data = pool.map(tensorize_pair, data)
51 num_splits = len(data) / 10000
52

1 frames
/usr/lib/python3.7/multiprocessing/pool.py in get(self, timeout)
655 return self._value
656 else:
--> 657 raise self._value
658
659 def _set(self, i, obj):

KekulizeException: Can't kekulize mol. Unkekulized atoms: 1 2 3 4 26

xinyangATK · 2023-07-15T12:29:47Z

Have you solved it? I have also met this problem and I skip these molecules but failed in another error:

Original Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/lxy/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/ubuntu/anaconda3/envs/lxy/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/ubuntu/anaconda3/envs/lxy/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/mnt/507aa612-08ac-464f-84b1-98511af354c1/LXY/iclr19-graph2graph/fast_jtnn/datautils.py", line 93, in __getitem__
    return tensorize(batch0, self.vocab, assm=False), tensorize(batch1, self.vocab, assm=self.y_assm)
  File "/mnt/507aa612-08ac-464f-84b1-98511af354c1/LXY/iclr19-graph2graph/fast_jtnn/datautils.py", line 109, in tensorize
    set_batch_nodeID(tree_batch, vocab)
  File "/mnt/507aa612-08ac-464f-84b1-98511af354c1/LXY/iclr19-graph2graph/fast_jtnn/datautils.py", line 136, in set_batch_nodeID
    node.wid = vocab.get_index(node.smiles)
  File "/mnt/507aa612-08ac-464f-84b1-98511af354c1/LXY/iclr19-graph2graph/fast_jtnn/mol_tree.py", line 18, in get_index
    return self.vmap[smiles]
KeyError: 'C1=CNN=C1'

Does this KetError mean that I should re-generate vocab of QED dataset?

RudrajitDawn · 2023-07-15T13:09:23Z

I made some working version of the code here: graph2graph_molecule

xinyangATK · 2023-07-15T15:51:23Z

I made some working version of the code here: graph2graph_molecule

Are you just replace the code

all_data = pool.map(tensorize_pair, data)

with

process_map(tensorize_pair, data, max_workers=16, chunksize=1)

This way indeed works for the former problem, but I still have the problem with KeyError that I mentioned above

RudrajitDawn · 2023-07-15T17:34:37Z

You can create a new issue with the new problem. I think the key should be in the "vocab.txt" . So, if it is not there, it may be manually added or the vocabulary can be extracted from list of SMILES strings using "mol_tree.py"

zkysfls · 2024-09-26T20:55:02Z

Have you solved it? I have also met this problem and I skip these molecules but failed in another error:

Original Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/lxy/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/ubuntu/anaconda3/envs/lxy/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/ubuntu/anaconda3/envs/lxy/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/mnt/507aa612-08ac-464f-84b1-98511af354c1/LXY/iclr19-graph2graph/fast_jtnn/datautils.py", line 93, in __getitem__
    return tensorize(batch0, self.vocab, assm=False), tensorize(batch1, self.vocab, assm=self.y_assm)
  File "/mnt/507aa612-08ac-464f-84b1-98511af354c1/LXY/iclr19-graph2graph/fast_jtnn/datautils.py", line 109, in tensorize
    set_batch_nodeID(tree_batch, vocab)
  File "/mnt/507aa612-08ac-464f-84b1-98511af354c1/LXY/iclr19-graph2graph/fast_jtnn/datautils.py", line 136, in set_batch_nodeID
    node.wid = vocab.get_index(node.smiles)
  File "/mnt/507aa612-08ac-464f-84b1-98511af354c1/LXY/iclr19-graph2graph/fast_jtnn/mol_tree.py", line 18, in get_index
    return self.vmap[smiles]
KeyError: 'C1=CNN=C1'

Does this KetError mean that I should re-generate vocab of QED dataset?

Hi! I want to know have you solved this problem?

RudrajitDawn · 2024-09-27T02:12:56Z

I forgot whether I solved it or not.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KekulizeException: Can't kekulize mol. #8

KekulizeException: Can't kekulize mol. #8

RudrajitDawn commented Apr 27, 2022

xinyangATK commented Jul 15, 2023

RudrajitDawn commented Jul 15, 2023

xinyangATK commented Jul 15, 2023

RudrajitDawn commented Jul 15, 2023

zkysfls commented Sep 26, 2024

RudrajitDawn commented Sep 27, 2024

KekulizeException: Can't kekulize mol. #8

KekulizeException: Can't kekulize mol. #8

Comments

RudrajitDawn commented Apr 27, 2022

xinyangATK commented Jul 15, 2023

RudrajitDawn commented Jul 15, 2023

xinyangATK commented Jul 15, 2023

RudrajitDawn commented Jul 15, 2023

zkysfls commented Sep 26, 2024

RudrajitDawn commented Sep 27, 2024