Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportError: Could not find module 'torchmdnet_neighbors' in /torchmdnet/neighbors #226

Closed
hinostra opened this issue Oct 3, 2023 · 12 comments

Comments

@hinostra
Copy link

hinostra commented Oct 3, 2023

Hello, I am trying to reproduce the results of the "Coarse Graining of Chignolin" section of the paper "TorchMD: A Deep Learning Framework for Molecular Simulations". I didn't have any problem when running the first part of the notebook Chignolin_Coarse-Grained_Tutorial.ipynb. However, when I tried to run the subsequent section I got the error message
ImportError: Could not find module 'torchmdnet_neighbors' in /content/drive/MyDrive/torchmdnet/neighbors

Screenshot 2023-10-03 at 3 29 13 PM Screenshot 2023-10-03 at 3 29 33 PM

I want to run everything on Google Colab (because of a request of my PI) so I put the torchmdnet repo and torchmd in my Drive folder and also imported every required package indicated on the environment.yml; I am not sure which is exactly the source of error, so if anyone has an idea on how to solve it, it would be very appreciated!

Also if you consider that I must run this tutorial on a local cpu/gpu or cluster (I mean if it's the only way to do it), I would like to know.

@RaulPPelaez
Copy link
Collaborator

Did you run "pip install ." in the torchmd-net folder also?
Note that we are also in the process of releasing a conda package for torchmd-net. As a side effect there is a currently unlisted additional dependency, the cuda SDK (nvcc).
You can also install torchmd-net via the acellera channel,"mamba install -c acellera torchmd-net"

@hinostra
Copy link
Author

hinostra commented Oct 4, 2023

This is the result of running 'pip install .' in the torchmd-net-main folder
Screenshot 2023-10-04 at 12 49 54 PM

About the second suggestion, I couldn't use mamba on Google Colab. I try to add the required libraries to the Python installation in Colab as suggested in this stackoverflow article

!pip install -q condacolab
import condacolab
condacolab.install() # expect a kernel restart

# mount google drive to access data and torchmdnet module
from google.colab import drive
drive.mount('/content/drive')

# change current working directory
import os
os.chdir('/content/drive/My Drive/')

# here I use the environment.yml file on the main directory of torchmd-net
!mamba env update -n base -f `environment.yml

But it also reports problems when it tries to load torchmdnet.

@RaulPPelaez
Copy link
Collaborator

I believe the issue there is that you are supposed to have this on the first cell:

!pip install -q condacolab
import condacolab
condacolab.install() # expect a kernel restart

The last line will reset the kernel, so the rest will not run.
pip installation of the torchmd-net dependencies is not straight forward. Some of them are not even available via pip, NNPOps in particular.
The easiest way is to use the acellera channel.

Having said this I have been trying to run it in google colab but I keep running into issues related to CUDA stuff created by the conda installation.

@hinostra
Copy link
Author

hinostra commented Oct 5, 2023

I use

!mamba install -c acellera torchmd-net

And apparently everything was installed without problems but then when I tried to import some libraries

import sys
import os
import argparse
import logging
import lightning.pytorch as pl
from lightning.pytorch.strategies import DDPStrategy
from lightning.pytorch.loggers import WandbLogger, CSVLogger, TensorBoardLogger
from lightning.pytorch.callbacks import (
    ModelCheckpoint,
    EarlyStopping,
)
from torchmdnet.module import LNNP
from torchmdnet import datasets, priors, models
from torchmdnet.data import DataModule
from torchmdnet.models import output_modules
from torchmdnet.models.model import create_prior_models
from torchmdnet.models.utils import rbf_class_mapping, act_class_mapping, dtype_mapping
from torchmdnet.utils import LoadFromFile, LoadFromCheckpoint, save_argparse, number
from lightning_utilities.core.rank_zero import rank_zero_warn

I got the error

/usr/local/lib/python3.10/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/usr/local/lib/python3.10/dist-packages/torchvision/image.so: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKSs'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
[<ipython-input-4-6ec6d641277a>](https://localhost:8080/#) in <cell line: 5>()
      3 import argparse
      4 import logging
----> 5 import lightning.pytorch as pl
      6 from lightning.pytorch.strategies import DDPStrategy
      7 from lightning.pytorch.loggers import WandbLogger, CSVLogger, TensorBoardLogger

21 frames
[/usr/lib/python3.10/ctypes/__init__.py](https://localhost:8080/#) in __init__(self, name, mode, handle, use_errno, use_last_error, winmode)
    372 
    373         if handle is None:
--> 374             self._handle = _dlopen(self._name, mode)
    375         else:
    376             self._handle = handle

OSError: /usr/local/lib/python3.10/dist-packages/torchaudio/lib/libtorchaudio.so: undefined symbol: _ZNK5torch8autograd4Node4nameEv

Also, I noticed that the versions of the packages installed using acellera are different from those indicated in the environment.yml file

@hinostra
Copy link
Author

hinostra commented Oct 5, 2023

New question! In my local cpu I try to create the environment using
mamba env create -f environment.yml
and
mamba env create -n torchmd-net
but I got the error message

Could not solve for environment specs
The following packages are incompatible
├─ gxx   does not exist (perhaps a typo or a missing channel);
├─ nnpops 0.5  does not exist (perhaps a typo or a missing channel);
└─ pytorch_geometric 2.3.1  is uninstallable because it requires

Also I tried
mamba install -c acellera torchmd-net
and got

Could not solve for environment specs
The following package could not be installed
└─ torchmd-net   does not exist (perhaps a typo or a missing channel).

@RaulPPelaez
Copy link
Collaborator

In colab, the torch/cuda versions installed as dependencies are interfering with the torch/cuda versions already available in the system. This is due to the fact that in colab you cannot create a new conda env, only use the base one which installs to /usr/local.
I have not been able to overcome this.

To install in your current machine, do this:

$ mamba create -n my_new_env -c acellera torchmd-net

@hinostra
Copy link
Author

Hi, I tried to installed with the command you mention

mamba create -n tochmdnet_env -c acellera torchmd-net

and I got an error

Could not solve for environment specs
The following package could not be installed
└─ torchmd-net   does not exist (perhaps a typo or a missing channel).

@RaulPPelaez
Copy link
Collaborator

The package is there: https://anaconda.org/acellera/torchmd-net/
The command does not work in my laptop, but it works in a machine with a GPU.
Does your machine have a GPU?
It should work either way, I am trying to figure out why...

@RaulPPelaez
Copy link
Collaborator

I understood the origin of the problem, it is unrelated to GPUs. This is the output in my laptop:

$ mamba install -c acellera torchmd-net==0.7.1
/home/raul/mambaforge/lib/python3.10/site-packages/conda_package_streaming/package_streaming.py:19: UserWarning: zstandard could not be imported. Running without .conda support.
  warnings.warn("zstandard could not be imported. Running without .conda support.")
/home/raul/mambaforge/lib/python3.10/site-packages/conda_package_handling/api.py:29: UserWarning: Install zstandard Python bindings for .conda support
  _warnings.warn("Install zstandard Python bindings for .conda support")

Looking for: ['torchmd-net==0.7.1']

conda-forge/linux-64                                        Using cache
conda-forge/noarch                                          Using cache
acellera/linux-64                                             No change
acellera/noarch                                               No change

Pinned packages:
  - python 3.10.*


Could not solve for environment specs
The following package could not be installed
└─ torchmd-net 0.7.1  does not exist (perhaps a typo or a missing channel).

The initial warnings are not present in the other machine.
By trying to import zstandard I get:

$ python
Python 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:26:04) [GCC 10.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import zstandard
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/raul/mambaforge/lib/python3.10/site-packages/zstandard/__init__.py", line 39, in <module>
    from .backend_c import *  # type: ignore
ImportError: zstd C API versions mismatch; Python bindings were not compiled/linked against expected zstd version (10505 returned by the lib, 10502 hardcoded in zstd headers, 10502 hardcoded in the cext)

The zstandard library is used for decompressing the newer .conda package format in the repodata.json files that are pulled from anaconda, which is compressed using the Zstandard algorithm. Without it working correctly, mamba is "blind" to packages in this section, such as torchmd-net in this case.
In my particular case, this was due to a recent OS upgrade (Fedora upgraded from 38 to 39) which caused some C libraries to change and break mamba.
I had to reinstall mamba and now the issue is solved.
Perhaps you are in a similar situation?

@hinostra
Copy link
Author

I uninstall and then installed again Mamba and the problem wasn't solved. I still got the following error message
Screenshot 2023-10-16 at 11 59 20 AM
I also check if zstandard was installed and it is
zstandard 0.19.0 py310h8e9501a_0 conda-forge

@RaulPPelaez
Copy link
Collaborator

Oh wait, you seem to be using OSX? There is no package built for OSX. Some dependencies prevent that.
In fact, acellera channel only provides packages for linux:
https://anaconda.org/acellera/torchmd-net

I do not have experience with OSX, so I cannot really guide you with building the package yourself there.
It should however be CPU only, which simplifies things.
NNPOps is getting OSX support these days, which I think is the main stopper for torchmd-net: openmm/NNPOps#115

@hinostra
Copy link
Author

Oh, I get it. Then I would try to do it in my laptop (which uses linux).
Thank you for the continue support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants