Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError when using calc.pandas #106

Open
alsalehf opened this issue Mar 20, 2023 · 1 comment
Open

RuntimeError when using calc.pandas #106

alsalehf opened this issue Mar 20, 2023 · 1 comment

Comments

@alsalehf
Copy link

description

I get stuck in a loop when using pandas.clac and results in runtime error
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase

Below is the code im using for testing:
minimal reproduction code
from rdkit import Chem
from mordred import Calculator, descriptors
#import pandas as pd
import unicodedata

components = ["CCO"]

def s2d(smiles_list):
final_list = [unicodedata.normalize("NFKD", ls) for ls in smiles_list]

mols = [Chem.MolFromSmiles(smi) for smi in final_list]
calc = Calculator(descriptors, ignore_3D=True)
df3 = calc.pandas(mols)
return df3

s2d(components)

df = s2d(components)
print(df)

Please fill me if possible.

environment

I'm running the code in a windows 10 machine with a venv environment.

Please fill me.

conda or pip

pip.

python version

Python 3.10.4

library version

Please execute the command and paste result.

  • pip

Package Version


mordred 1.2.0
networkx 2.8.8
numpy 1.24.2
pandas 1.5.3
Pillow 9.4.0
pip 22.0.4
python-dateutil 2.8.2
pytz 2022.7.1
rdkit 2022.9.5
setuptools 58.1.0
six 1.16.0

pip show rdkit
Name: rdkit
Version: 2022.9.5
Summary: A collection of chemoinformatics and machine-learning software written in C++ and Python
Home-page: https://github.com/kuelumbus/rdkit-pypi
Author: Christopher Kuenneth
Author-email: [email protected]
License: BSD-3-Clause
Location: c:\users\admin\chemslenv\lib\site-packages
Requires: numpy, Pillow
Required-by:

@ismorphism
Copy link

ismorphism commented Mar 25, 2023

@alsalehf Hello! That's very important question, but somehow the developers do not have a time to answer it. The solution is to drop all "bad" descriptors - they do not work for you due to problems with your molecules' stereochemistry or smth else. That's my idea, which it's based on my experience:

from mordred import Calculator, PBF, MomentOfInertia, TopologicalCharge, MolecularDistanceEdge, MoRSE, GravitationalIndex, GeometricalIndex, EState, DistanceMatrix, DetourMatrix, CPSA, BaryszMatrix, Autocorrelation, AdjacencyMatrix, descriptors, get_descriptors_from_module

descs = get_descriptors_from_module(descriptors, submodule=True)

# exclude some from descs
descs = filter(lambda d: ((d.__module__ != AdjacencyMatrix.__name__) and 
                          (d.__module__ != Autocorrelation.__name__) and
                          (d.__module__ != DetourMatrix.__name__) and 
                          (d.__module__ != BaryszMatrix.__name__) and 
                          (d.__module__ != CPSA.__name__) and 
                          (d.__module__ != DistanceMatrix.__name__) and 
                          (d.__module__ != EState.__name__) and 
                          (d.__module__ != GeometricalIndex.__name__) and 
                          (d.__module__ != GravitationalIndex.__name__) and 
                          (d.__module__ != MoRSE.__name__) and 
                          (d.__module__ != MolecularDistanceEdge.__name__) and 
                          (d.__module__ != MomentOfInertia.__name__) and 
                          (d.__module__ != PBF.__name__) and 
                          (d.__module__ != TopologicalCharge.__name__)), descs)

calc = Calculator(descs)
calc.pandas(mols)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants