Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when creating simulation context with simple TorchForce force #88

Closed
FranklinHu1 opened this issue Nov 7, 2022 · 33 comments
Labels
help wanted Extra attention is needed

Comments

@FranklinHu1
Copy link

Hello,

I am running into a segmentation fault when adding a simple TorchForce force to the alanine dipeptide test system from openmmtools. This is similar to issue #87, but I am trying and failing to do something far simpler.

For my environment, I am using the openmm-8-beta-linux environment generated from the following command:

conda env create mmh/openmm-8-beta-linux

The only modification I made to the environment is installing openmmtools to gain access to the alanine dipeptide system I have been using for debugging. A printout of my environment is as follows:

name: openmm-8-beta-linux
channels:
  - conda-forge/label/openmm-torch_rc
  - conda-forge/label/openmm_rc
  - conda-forge
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=2_kmp_llvm
  - astunparse=1.6.3=pyhd8ed1ab_0
  - attrs=22.1.0=pyh71513ae_1
  - blosc=1.21.1=h83bc5f7_3
  - brotlipy=0.7.0=py310h5764c6d_1005
  - bzip2=1.0.8=h7f98852_4
  - c-ares=1.18.1=h7f98852_0
  - ca-certificates=2022.9.24=ha878542_0
  - cached-property=1.5.2=hd8ed1ab_1
  - cached_property=1.5.2=pyha770c72_1
  - certifi=2022.9.24=pyhd8ed1ab_0
  - cffi=1.15.1=py310h255011f_2
  - cftime=1.6.2=py310hde88566_1
  - charset-normalizer=2.1.1=pyhd8ed1ab_0
  - colorama=0.4.6=pyhd8ed1ab_0
  - cryptography=38.0.3=py310h600f1e7_0
  - cudatoolkit=11.7.0=hd8887f6_10
  - cudnn=8.4.1.50=hed8a83a_0
  - curl=7.86.0=h2283fc2_1
  - exceptiongroup=1.0.1=pyhd8ed1ab_0
  - h5py=3.7.0=nompi_py310h416281c_102
  - hdf4=4.2.15=h9772cbc_5
  - hdf5=1.12.2=nompi_h4df4325_100
  - icu=70.1=h27087fc_0
  - idna=3.4=pyhd8ed1ab_0
  - importlib-metadata=5.0.0=pyha770c72_1
  - importlib_metadata=5.0.0=hd8ed1ab_1
  - iniconfig=1.1.1=pyh9f0ad1d_0
  - jpeg=9e=h166bdaf_2
  - keyutils=1.6.1=h166bdaf_0
  - krb5=1.19.3=h08a2579_0
  - lark-parser=0.12.0=pyhd8ed1ab_0
  - ld_impl_linux-64=2.39=hc81fddc_0
  - libblas=3.9.0=16_linux64_mkl
  - libcblas=3.9.0=16_linux64_mkl
  - libcurl=7.86.0=h2283fc2_1
  - libedit=3.1.20191231=he28a2e2_2
  - libev=4.33=h516909a_1
  - libffi=3.4.2=h7f98852_5
  - libgcc-ng=12.2.0=h65d4601_19
  - libgfortran-ng=12.2.0=h69a702a_19
  - libgfortran5=12.2.0=h337968e_19
  - libiconv=1.17=h166bdaf_0
  - liblapack=3.9.0=16_linux64_mkl
  - libllvm11=11.1.0=he0ac6c6_5
  - libnetcdf=4.8.1=nompi_h261ec11_106
  - libnghttp2=1.47.0=hff17c54_1
  - libnsl=2.0.0=h7f98852_0
  - libprotobuf=3.20.1=h6239696_4
  - libsqlite=3.39.4=h753d276_0
  - libssh2=1.10.0=hf14f497_3
  - libstdcxx-ng=12.2.0=h46fd767_19
  - libuuid=2.32.1=h7f98852_1000
  - libxml2=2.10.3=h7463322_0
  - libzip=1.9.2=hc929e4a_1
  - libzlib=1.2.13=h166bdaf_4
  - llvm-openmp=14.0.4=he0ac6c6_0
  - llvmlite=0.39.1=py310h58363a5_1
  - lz4-c=1.9.3=h9c3ff4c_1
  - lzo=2.10=h516909a_1000
  - magma=2.5.4=hc72dce7_4
  - mdtraj=1.9.7=py310h902c554_2
  - mkl=2022.1.0=h84fe81f_915
  - mpiplus=v0.0.1=pyhd8ed1ab_1003
  - nccl=2.14.3.1=h0800d71_0
  - ncurses=6.3=h27087fc_1
  - netcdf4=1.6.1=nompi_py310h55e1e36_101
  - ninja=1.11.0=h924138e_0
  - nnpops=0.2=cuda112py310h8b99da5_5
  - nose=1.3.7=py_1006
  - numba=0.56.3=py310ha5257ce_0
  - numexpr=2.8.3=mkl_py310h0afd4a5_1
  - numpy=1.23.4=py310h53a5b5f_1
  - ocl-icd=2.3.1=h7f98852_0
  - ocl-icd-system=1.0.0=1
  - openmm=8.0.0beta=py310h2996cf7_2
  - openmm-ml=1.0beta=pyh79ba5db_2
  - openmm-torch=1.0beta=cuda112py310h02d4f52_1
  - openmmtools=0.21.5=pyhd8ed1ab_0
  - openssl=3.0.7=h166bdaf_0
  - packaging=21.3=pyhd8ed1ab_0
  - pandas=1.5.1=py310h769672d_1
  - pdbfixer=1.8.1=pyh6c4a22f_0
  - pip=22.3.1=pyhd8ed1ab_0
  - pluggy=1.0.0=pyhd8ed1ab_5
  - pycparser=2.21=pyhd8ed1ab_0
  - pymbar=3.1.0=py310hde88566_0
  - pyopenssl=22.1.0=pyhd8ed1ab_0
  - pyparsing=3.0.9=pyhd8ed1ab_0
  - pysocks=1.7.1=pyha2e5f31_6
  - pytables=3.7.0=py310hb60b9b2_3
  - pytest=7.2.0=pyhd8ed1ab_2
  - python=3.10.6=ha86cf86_0_cpython
  - python-dateutil=2.8.2=pyhd8ed1ab_0
  - python_abi=3.10=2_cp310
  - pytorch=1.11.0=cuda112py310h51fe464_1
  - pytz=2022.6=pyhd8ed1ab_0
  - pyyaml=6.0=py310h5764c6d_5
  - readline=8.1.2=h0f457ee_0
  - requests=2.28.1=pyhd8ed1ab_1
  - scipy=1.9.3=py310hdfbd76f_1
  - setuptools=59.5.0=py310hff52083_0
  - setuptools-scm=6.3.2=pyhd8ed1ab_0
  - setuptools_scm=6.3.2=hd8ed1ab_0
  - six=1.16.0=pyh6c4a22f_0
  - sleef=3.5.1=h9b69904_2
  - snappy=1.1.9=hbd366e4_2
  - tbb=2021.6.0=h924138e_1
  - tk=8.6.12=h27826a3_0
  - tomli=2.0.1=pyhd8ed1ab_0
  - torchani=2.2.2=cuda112py310h98dee98_6
  - typing_extensions=4.4.0=pyha770c72_0
  - tzdata=2022f=h191b570_0
  - urllib3=1.26.11=pyhd8ed1ab_0
  - wheel=0.38.1=pyhd8ed1ab_0
  - xz=5.2.6=h166bdaf_0
  - yaml=0.2.5=h7f98852_2
  - zipp=3.10.0=pyhd8ed1ab_0
  - zlib=1.2.13=h166bdaf_4
  - zstd=1.5.2=h6239696_4
prefix: /home/frankhu/mambaforge/envs/openmm-8-beta-linux

The script I have been trying to run is as follows:

import openmmtools
import torch
from openmmtorch import TorchForce
import sys
from openmm import LangevinMiddleIntegrator
from openmm.app import Simulation
from openmm.unit import kelvin, picosecond, femtosecond
import openmm

ala2 = openmmtools.testsystems.AlanineDipeptideVacuum(constraints=None)

while ala2.system.getNumForces() > 0:
    print("removing force")
    ala2.system.removeForce(0)

assert(ala2.system.getNumConstraints() == 0)
assert(ala2.system.getNumForces() == 0)

#Simple Harmonic force from openmm-torch README
class ForceModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
    def forward(self, positions):
        return torch.sum(positions**2)

module = torch.jit.script(ForceModule())
module.save("harmonic.pt")

force = TorchForce('harmonic.pt')
ala2.system.addForce(force)
assert(ala2.system.getNumForces() == 1)

temperature = 298.15 * kelvin
frictionCoeff = 1 / picosecond
timeStep = 1 * femtosecond
integrator = LangevinMiddleIntegrator(temperature, frictionCoeff, timeStep)
#import pdb as debug; debug.set_trace()
simulation = Simulation(ala2.topology, ala2.system, integrator) #The code segfaults within this call

As the comment indicates, a segmentation fault occurs when building the simulation. The exact cause of the error is the initialization of the context, and the segmentation fault can be triggered directly using openmm.openmm.Context() using the modified alanine system and the integrator. I have tried this approach on two different Linux systems and have run into the same segmentation fault both times.

Any clarification help would be greatly appreciated. Thank you!

@FranklinHu1 FranklinHu1 changed the title TorchForce segmentation fault when creating simulation context with simple TorchForce force Segmentation fault when creating simulation context with simple TorchForce force Nov 7, 2022
@peastman
Copy link
Member

peastman commented Nov 7, 2022

Your script runs fine for me. Can you try running it inside gdb?

  1. Type gdb python.
  2. At the gdb prompt, type run test.py (or whatever the name of your script is).
  3. Wait for it to segfault, then type bt to get a stack trace.
  4. Post it here.

@FranklinHu1
Copy link
Author

I ran the code using the above instructions. This is what I got when I typed bt to get the stack trace:

#0  0x00002aaac2e4c0e8 in TorchPlugin::TorchForceImpl::initialize(OpenMM::ContextImpl&) () from /home/frankhu/mambaforge/envs/openmm-8-beta-linux/lib/plugins/../libOpenMMTorch.so
#1  0x00002aaac33284b7 in OpenMM::ContextImpl::initialize() () from /home/frankhu/mambaforge/envs/openmm-8-beta-linux/lib/python3.10/site-packages/openmm/../../../libOpenMM.so.8.0
#2  0x00002aaac33206cb in OpenMM::Context::Context(OpenMM::System const&, OpenMM::Integrator&) () from /home/frankhu/mambaforge/envs/openmm-8-beta-linux/lib/python3.10/site-packages/openmm/../../../libOpenMM.so.8.0
#3  0x00002aaac30d74ab in _wrap_new_Context () from /home/frankhu/mambaforge/envs/openmm-8-beta-linux/lib/python3.10/site-packages/openmm/_openmm.cpython-310-x86_64-linux-gnu.so
#4  0x00005555556997e2 in cfunction_call () at /usr/local/src/conda/python-3.10.6/Objects/methodobject.c:552
#5  0x00005555556a7e19 in _PyObject_Call (kwargs=<optimized out>, args=0x2aab70c5f840, callable=0x2aaac2e7ff10, tstate=0x55555591eb40) at /usr/local/src/conda/python-3.10.6/Objects/call.c:305
#6  PyObject_Call () at /usr/local/src/conda/python-3.10.6/Objects/call.c:317
#7  0x000055555568e93f in do_call_core (kwdict=0x0, callargs=0x2aab70c5f840, func=0x2aaac2e7ff10, trace_info=0x7fffffffd2a0, tstate=<optimized out>) at /usr/local/src/conda/python-3.10.6/Python/ceval.c:5915
#8  _PyEval_EvalFrameDefault () at /usr/local/src/conda/python-3.10.6/Python/ceval.c:4277
#9  0x0000555555691882 in _PyEval_EvalFrame (throwflag=0, f=0x2aab70d59480, tstate=0x55555591eb40) at /usr/local/src/conda/python-3.10.6/Python/ceval.c:5052
#10 _PyEval_Vector (kwnames=0x0, argcount=<optimized out>, args=<optimized out>, locals=0x0, con=0x2aaac365ecc0, tstate=0x55555591eb40) at /usr/local/src/conda/python-3.10.6/Python/ceval.c:5065
#11 _PyFunction_Vectorcall (kwnames=0x0, nargsf=<optimized out>, stack=<optimized out>, func=0x2aaac365ecb0) at /usr/local/src/conda/python-3.10.6/Objects/call.c:342
#12 _PyObject_FastCallDictTstate.localalias () at /usr/local/src/conda/python-3.10.6/Objects/call.c:142
#13 0x00005555556a48c1 in _PyObject_Call_Prepend (kwargs=0x0, args=0x2aab6f483ec0, obj=<optimized out>, callable=0x2aaac365ecb0, tstate=0x55555591eb40) at /usr/local/src/conda/python-3.10.6/Objects/call.c:431
#14 slot_tp_init () at /usr/local/src/conda/python-3.10.6/Objects/typeobject.c:7734
#15 0x000055555569269b in type_call (kwds=0x0, args=0x2aab6f483ec0, type=<optimized out>) at /usr/local/src/conda/python-3.10.6/Objects/call.c:224
#16 _PyObject_MakeTpCall.localalias () at /usr/local/src/conda/python-3.10.6/Objects/call.c:215
#17 0x000055555568e0f7 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=<optimized out>, callable=0x5555562c58a0, tstate=<optimized out>) at /usr/local/src/conda/python-3.10.6/Include/cpython/abstract.h:112
#18 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x2aab70b8f1f0, callable=0x5555562c58a0, tstate=<optimized out>) at /usr/local/src/conda/python-3.10.6/Include/cpython/abstract.h:99
#19 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x2aab70b8f1f0, callable=0x5555562c58a0) at /usr/local/src/conda/python-3.10.6/Include/cpython/abstract.h:123
#20 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7fffffffd590, tstate=<optimized out>) at /usr/local/src/conda/python-3.10.6/Python/ceval.c:5891
#21 _PyEval_EvalFrameDefault () at /usr/local/src/conda/python-3.10.6/Python/ceval.c:4181
#22 0x0000555555691882 in _PyEval_EvalFrame (throwflag=0, f=0x2aab70b8f040, tstate=0x55555591eb40) at /usr/local/src/conda/python-3.10.6/Python/ceval.c:5052
#23 _PyEval_Vector (kwnames=0x0, argcount=<optimized out>, args=<optimized out>, locals=0x0, con=0x2aab6529a4e0, tstate=0x55555591eb40) at /usr/local/src/conda/python-3.10.6/Python/ceval.c:5065
#24 _PyFunction_Vectorcall (kwnames=0x0, nargsf=<optimized out>, stack=<optimized out>, func=0x2aab6529a4d0) at /usr/local/src/conda/python-3.10.6/Objects/call.c:342
#25 _PyObject_FastCallDictTstate.localalias () at /usr/local/src/conda/python-3.10.6/Objects/call.c:142
#26 0x00005555556a48c1 in _PyObject_Call_Prepend (kwargs=0x0, args=0x2aab70ca8440, obj=<optimized out>, callable=0x2aab6529a4d0, tstate=0x55555591eb40) at /usr/local/src/conda/python-3.10.6/Objects/call.c:431
#27 slot_tp_init () at /usr/local/src/conda/python-3.10.6/Objects/typeobject.c:7734
#28 0x000055555569269b in type_call (kwds=0x0, args=0x2aab70ca8440, type=<optimized out>) at /usr/local/src/conda/python-3.10.6/Objects/call.c:224
#29 _PyObject_MakeTpCall.localalias () at /usr/local/src/conda/python-3.10.6/Objects/call.c:215
#30 0x000055555568daac in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x2aaaaab21ba8, callable=<optimized out>, tstate=<optimized out>) at /usr/local/src/conda/python-3.10.6/Include/cpython/abstract.h:112
#31 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x2aaaaab21ba8, callable=<optimized out>, tstate=<optimized out>) at /usr/local/src/conda/python-3.10.6/Include/cpython/abstract.h:99
#32 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x2aaaaab21ba8, callable=<optimized out>) at /usr/local/src/conda/python-3.10.6/Include/cpython/abstract.h:123
#33 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7fffffffd880, tstate=<optimized out>) at /usr/local/src/conda/python-3.10.6/Python/ceval.c:5891
#34 _PyEval_EvalFrameDefault () at /usr/local/src/conda/python-3.10.6/Python/ceval.c:4213
#35 0x000055555573ec62 in _PyEval_EvalFrame (throwflag=0, f=0x2aaaaab21a40, tstate=0x55555591eb40) at /usr/local/src/conda/python-3.10.6/Include/internal/pycore_ceval.h:46
#36 _PyEval_Vector () at /usr/local/src/conda/python-3.10.6/Python/ceval.c:5065
#37 0x000055555573eba7 in PyEval_EvalCode (co=co@entry=0x2aaab218d630, globals=globals@entry=0x2aaab2150f80, locals=locals@entry=0x2aaab2150f80) at /usr/local/src/conda/python-3.10.6/Python/ceval.c:1134
#38 0x00005555557726b9 in run_eval_code_obj () at /usr/local/src/conda/python-3.10.6/Python/pythonrun.c:1291
#39 0x000055555576cfd4 in run_mod () at /usr/local/src/conda/python-3.10.6/Python/pythonrun.c:1312
#40 0x00005555555ec34d in pyrun_file (fp=0x555555959720, filename=0x2aaab2155d10, start=257, globals=0x2aaab2150f80, locals=0x2aaab2150f80, closeit=1, flags=0x7fffffffdb78) at /usr/local/src/conda/python-3.10.6/Python/pythonrun.c:1208
#41 0x000055555576731f in _PyRun_SimpleFileObject.localalias () at /usr/local/src/conda/python-3.10.6/Python/pythonrun.c:456
#42 0x0000555555766ee3 in _PyRun_AnyFileObject.localalias () at /usr/local/src/conda/python-3.10.6/Python/pythonrun.c:90
#43 0x000055555576408f in pymain_run_file_obj (skip_source_first_line=0, filename=0x2aaab2155d10, program_name=0x2aaab2155f40) at /usr/local/src/conda/python-3.10.6/Modules/main.c:357
#44 pymain_run_file (config=0x555555902f20) at /usr/local/src/conda/python-3.10.6/Modules/main.c:376
#45 pymain_run_python (exitcode=0x7fffffffdb70) at /usr/local/src/conda/python-3.10.6/Modules/main.c:591
#46 Py_RunMain.localalias () at /usr/local/src/conda/python-3.10.6/Modules/main.c:670
#47 0x0000555555732389 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at /usr/local/src/conda/python-3.10.6/Modules/main.c:1090
#48 0x00002aaaab81e555 in __libc_start_main () from /lib64/libc.so.6
#49 0x0000555555732291 in _start () at /usr/local/src/conda/python-3.10.6/Include/internal/pycore_long.h:24

@raimis raimis added the help wanted Extra attention is needed label Nov 8, 2022
@raimis
Copy link
Contributor

raimis commented Nov 8, 2022

This maybe related to #84

@raimis
Copy link
Contributor

raimis commented Nov 9, 2022

I have created environment:

conda env create mmh/openmm-8-beta-linux
conda activate openmm-8-beta-linux
conda install -c conda-forge openmmtools
conda list
# packages in environment at /shared2/raimis/opt/miniconda/envs/openmm-8-beta-linux:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                  2_kmp_llvm    conda-forge
astunparse                1.6.3              pyhd8ed1ab_0    conda-forge
attrs                     22.1.0             pyh71513ae_1    conda-forge
blas                      2.116                  openblas    conda-forge
blas-devel                3.9.0           16_linux64_openblas    conda-forge
blosc                     1.21.1               h83bc5f7_3    conda-forge
brotlipy                  0.7.0           py310h5764c6d_1005    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2022.9.24            ha878542_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
certifi                   2022.9.24          pyhd8ed1ab_0    conda-forge
cffi                      1.15.1          py310h255011f_2    conda-forge
cftime                    1.6.2           py310hde88566_1    conda-forge
charset-normalizer        2.1.1              pyhd8ed1ab_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
cryptography              38.0.3          py310h600f1e7_0    conda-forge
cudatoolkit               11.7.0              hd8887f6_10    conda-forge
cudnn                     8.4.1.50             hed8a83a_0    conda-forge
curl                      7.86.0               h2283fc2_1    conda-forge
exceptiongroup            1.0.1              pyhd8ed1ab_0    conda-forge
h5py                      3.7.0           nompi_py310h416281c_102    conda-forge
hdf4                      4.2.15               h9772cbc_5    conda-forge
hdf5                      1.12.2          nompi_h4df4325_100    conda-forge
icu                       70.1                 h27087fc_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
importlib-metadata        5.0.0              pyha770c72_1    conda-forge
importlib_metadata        5.0.0                hd8ed1ab_1    conda-forge
iniconfig                 1.1.1              pyh9f0ad1d_0    conda-forge
jpeg                      9e                   h166bdaf_2    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.19.3               h08a2579_0    conda-forge
lark-parser               0.12.0             pyhd8ed1ab_0    conda-forge
ld_impl_linux-64          2.39                 hc81fddc_0    conda-forge
libblas                   3.9.0           16_linux64_openblas    conda-forge
libcblas                  3.9.0           16_linux64_openblas    conda-forge
libcurl                   7.86.0               h2283fc2_1    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 12.2.0              h65d4601_19    conda-forge
libgfortran-ng            12.2.0              h69a702a_19    conda-forge
libgfortran5              12.2.0              h337968e_19    conda-forge
libiconv                  1.17                 h166bdaf_0    conda-forge
liblapack                 3.9.0           16_linux64_openblas    conda-forge
liblapacke                3.9.0           16_linux64_openblas    conda-forge
libllvm11                 11.1.0               he0ac6c6_5    conda-forge
libnetcdf                 4.8.1           nompi_h261ec11_106    conda-forge
libnghttp2                1.47.0               hff17c54_1    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libopenblas               0.3.21          pthreads_h78a6416_3    conda-forge
libprotobuf               3.21.9               h6239696_0    conda-forge
libsqlite                 3.39.4               h753d276_0    conda-forge
libssh2                   1.10.0               hf14f497_3    conda-forge
libstdcxx-ng              12.2.0              h46fd767_19    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libxml2                   2.10.3               h7463322_0    conda-forge
libzip                    1.9.2                hc929e4a_1    conda-forge
libzlib                   1.2.13               h166bdaf_4    conda-forge
llvm-openmp               14.0.4               he0ac6c6_0    conda-forge
llvmlite                  0.39.1          py310h58363a5_1    conda-forge
lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
lzo                       2.10              h516909a_1000    conda-forge
magma                     2.5.4                hc72dce7_4    conda-forge
mdtraj                    1.9.7           py310h902c554_4    conda-forge
mkl                       2022.1.0           h84fe81f_915    conda-forge
mpiplus                   v0.0.1          pyhd8ed1ab_1003    conda-forge
nccl                      2.14.3.1             h0800d71_0    conda-forge
ncurses                   6.3                  h27087fc_1    conda-forge
netcdf4                   1.6.1           nompi_py310h55e1e36_101    conda-forge
ninja                     1.11.0               h924138e_0    conda-forge
nnpops                    0.2             cuda112py310h8b99da5_5    conda-forge
nose                      1.3.7                   py_1006    conda-forge
numba                     0.56.3          py310ha5257ce_0    conda-forge
numexpr                   2.8.3           py310h757a811_0  
numpy                     1.23.4          py310h53a5b5f_1    conda-forge
ocl-icd                   2.3.1                h7f98852_0    conda-forge
ocl-icd-system            1.0.0                         1    conda-forge
openblas                  0.3.21          pthreads_h320a7e8_3    conda-forge
openmm                    8.0.0beta       py310h2996cf7_2    conda-forge/label/openmm_rc
openmm-ml                 1.0beta            pyh79ba5db_2    conda-forge/label/openmm_rc
openmm-torch              1.0beta         cuda112py310h02d4f52_1    conda-forge/label/openmm-torch_rc
openmmtools               0.21.5             pyhd8ed1ab_0    conda-forge
openssl                   3.0.7                h166bdaf_0    conda-forge
packaging                 21.3               pyhd8ed1ab_0    conda-forge
pandas                    1.5.1           py310h769672d_1    conda-forge
pdbfixer                  1.8.1              pyh6c4a22f_0    conda-forge
pip                       22.3.1             pyhd8ed1ab_0    conda-forge
pluggy                    1.0.0              pyhd8ed1ab_5    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pymbar                    3.1.0           py310hde88566_1    conda-forge
pyopenssl                 22.1.0             pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
pytables                  3.7.0           py310hb60b9b2_3    conda-forge
pytest                    7.2.0              pyhd8ed1ab_2    conda-forge
python                    3.10.6          ha86cf86_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.10                    2_cp310    conda-forge
pytorch                   1.12.1          cuda112py310he33e0d6_201    conda-forge
pytz                      2022.6             pyhd8ed1ab_0    conda-forge
pyyaml                    6.0             py310h5764c6d_5    conda-forge
readline                  8.1.2                h0f457ee_0    conda-forge
requests                  2.28.1             pyhd8ed1ab_1    conda-forge
scipy                     1.9.3           py310hdfbd76f_1    conda-forge
setuptools                59.5.0          py310hff52083_0    conda-forge
setuptools-scm            6.3.2              pyhd8ed1ab_0    conda-forge
setuptools_scm            6.3.2                hd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
sleef                     3.5.1                h9b69904_2    conda-forge
snappy                    1.1.9                hbd366e4_2    conda-forge
tbb                       2021.6.0             h924138e_1    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
torchani                  2.2.2           cuda112py310h98dee98_6    conda-forge
typing_extensions         4.4.0              pyha770c72_0    conda-forge
tzdata                    2022f                h191b570_0    conda-forge
urllib3                   1.26.11            pyhd8ed1ab_0    conda-forge
wheel                     0.38.2             pyhd8ed1ab_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
zipp                      3.10.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               h166bdaf_4    conda-forge
zstd                      1.5.2                h6239696_4    conda-forge

The script above runs without a problem:

python debug_88.py 
Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
removing force
removing force
removing force
removing force
removing force

The main different between @FranklinHu1 environment is pytorch version. I got 1.12.1, while @FranklinHu1 1.11.0.

I suspect, this may be related to conda-forge/openmm-torch-feedstock#20. @FranklinHu1 what version of conda are you using? Mine is 22.9.0.

@sef43
Copy link

sef43 commented Nov 9, 2022

I have been able to recreate this error using conda create env and I do not get the error when using mamba create env.

Method to get error (conda version 22.9.0):

conda env create mmh/openmm-8-beta-linux
conda activate openmm-8-beta-linux
conda install -c conda-forge openmmtools
python debug.py
Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
removing force
removing force
removing force
removing force
removing force
Segmentation fault (core dumped)

Method to run without error:

mamba env create mmh/openmm-8-beta-linux
conda activate openmm-8-beta-linux
mamba install -c conda-forge openmmtools
python debug.py
Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
removing force
removing force
removing force
removing force
removing force

This is the side by side diff of mamba list and conda list

diff -y mamba_list.txt conda_list.txt 
# packages in environment at /export/users/sfarr/miniconda3/e |	# packages in environment at /export/users/sfarr/miniconda3/e
#								#
# Name                    Version                   Build  Ch	# Name                    Version                   Build  Ch
_libgcc_mutex             0.1                 conda_forge    	_libgcc_mutex             0.1                 conda_forge    
_openmp_mutex             4.5                       2_gnu     |	_openmp_mutex             4.5                  2_kmp_llvm    
astunparse                1.6.3              pyhd8ed1ab_0    	astunparse                1.6.3              pyhd8ed1ab_0    
attrs                     22.1.0             pyh71513ae_1    	attrs                     22.1.0             pyh71513ae_1    
blosc                     1.21.1               h83bc5f7_3    	blosc                     1.21.1               h83bc5f7_3    
brotlipy                  0.7.0           py310h5764c6d_1005 	brotlipy                  0.7.0           py310h5764c6d_1005 
bzip2                     1.0.8                h7f98852_4    	bzip2                     1.0.8                h7f98852_4    
c-ares                    1.18.1               h7f98852_0    	c-ares                    1.18.1               h7f98852_0    
ca-certificates           2022.9.24            ha878542_0    	ca-certificates           2022.9.24            ha878542_0    
cached-property           1.5.2                hd8ed1ab_1    	cached-property           1.5.2                hd8ed1ab_1    
cached_property           1.5.2              pyha770c72_1    	cached_property           1.5.2              pyha770c72_1    
certifi                   2022.9.24          pyhd8ed1ab_0    	certifi                   2022.9.24          pyhd8ed1ab_0    
cffi                      1.15.1          py310h255011f_2    	cffi                      1.15.1          py310h255011f_2    
cftime                    1.6.2           py310hde88566_1    	cftime                    1.6.2           py310hde88566_1    
charset-normalizer        2.1.1              pyhd8ed1ab_0    	charset-normalizer        2.1.1              pyhd8ed1ab_0    
colorama                  0.4.6              pyhd8ed1ab_0    	colorama                  0.4.6              pyhd8ed1ab_0    
cryptography              38.0.3          py310h600f1e7_0    	cryptography              38.0.3          py310h600f1e7_0    
cudatoolkit               11.7.0              hd8887f6_10    	cudatoolkit               11.7.0              hd8887f6_10    
cudnn                     8.4.1.50             hed8a83a_0    	cudnn                     8.4.1.50             hed8a83a_0    
curl                      7.86.0               h2283fc2_1    	curl                      7.86.0               h2283fc2_1    
exceptiongroup            1.0.1              pyhd8ed1ab_0    	exceptiongroup            1.0.1              pyhd8ed1ab_0    
h5py                      3.7.0           nompi_py310h416281c	h5py                      3.7.0           nompi_py310h416281c
hdf4                      4.2.15               h9772cbc_5    	hdf4                      4.2.15               h9772cbc_5    
hdf5                      1.12.2          nompi_h4df4325_100 	hdf5                      1.12.2          nompi_h4df4325_100 
icu                       70.1                 h27087fc_0    	icu                       70.1                 h27087fc_0    
idna                      3.4                pyhd8ed1ab_0    	idna                      3.4                pyhd8ed1ab_0    
importlib-metadata        5.0.0              pyha770c72_1    	importlib-metadata        5.0.0              pyha770c72_1    
importlib_metadata        5.0.0                hd8ed1ab_1    	importlib_metadata        5.0.0                hd8ed1ab_1    
iniconfig                 1.1.1              pyh9f0ad1d_0    	iniconfig                 1.1.1              pyh9f0ad1d_0    
intel-openmp              2022.1.0          h9e868ea_3769     <
jpeg                      9e                   h166bdaf_2    	jpeg                      9e                   h166bdaf_2    
keyutils                  1.6.1                h166bdaf_0    	keyutils                  1.6.1                h166bdaf_0    
krb5                      1.19.3               h08a2579_0    	krb5                      1.19.3               h08a2579_0    
lark-parser               0.12.0             pyhd8ed1ab_0    	lark-parser               0.12.0             pyhd8ed1ab_0    
ld_impl_linux-64          2.39                 hc81fddc_0    	ld_impl_linux-64          2.39                 hc81fddc_0    
libblas                   3.9.0            16_linux64_mkl    	libblas                   3.9.0            16_linux64_mkl    
libcblas                  3.9.0            16_linux64_mkl    	libcblas                  3.9.0            16_linux64_mkl    
libcurl                   7.86.0               h2283fc2_1    	libcurl                   7.86.0               h2283fc2_1    
libedit                   3.1.20191231         he28a2e2_2    	libedit                   3.1.20191231         he28a2e2_2    
libev                     4.33                 h516909a_1    	libev                     4.33                 h516909a_1    
libffi                    3.4.2                h7f98852_5    	libffi                    3.4.2                h7f98852_5    
libgcc-ng                 12.2.0              h65d4601_19    	libgcc-ng                 12.2.0              h65d4601_19    
libgfortran-ng            12.2.0              h69a702a_19    	libgfortran-ng            12.2.0              h69a702a_19    
libgfortran5              12.2.0              h337968e_19    	libgfortran5              12.2.0              h337968e_19    
libgomp                   12.2.0              h65d4601_19     <
libiconv                  1.17                 h166bdaf_0    	libiconv                  1.17                 h166bdaf_0    
liblapack                 3.9.0            16_linux64_mkl    	liblapack                 3.9.0            16_linux64_mkl    
libllvm11                 11.1.0               he0ac6c6_5    	libllvm11                 11.1.0               he0ac6c6_5    
libnetcdf                 4.8.1           nompi_h261ec11_106 	libnetcdf                 4.8.1           nompi_h261ec11_106 
libnghttp2                1.47.0               hff17c54_1    	libnghttp2                1.47.0               hff17c54_1    
libnsl                    2.0.0                h7f98852_0    	libnsl                    2.0.0                h7f98852_0    
libprotobuf               3.20.1               h6239696_4    	libprotobuf               3.20.1               h6239696_4    
libsqlite                 3.39.4               h753d276_0    	libsqlite                 3.39.4               h753d276_0    
libssh2                   1.10.0               hf14f497_3    	libssh2                   1.10.0               hf14f497_3    
libstdcxx-ng              12.2.0              h46fd767_19    	libstdcxx-ng              12.2.0              h46fd767_19    
libuuid                   2.32.1            h7f98852_1000    	libuuid                   2.32.1            h7f98852_1000    
libxml2                   2.10.3               h7463322_0    	libxml2                   2.10.3               h7463322_0    
libzip                    1.9.2                hc929e4a_1    	libzip                    1.9.2                hc929e4a_1    
libzlib                   1.2.13               h166bdaf_4    	libzlib                   1.2.13               h166bdaf_4    
							      >	llvm-openmp               15.0.4               he0ac6c6_0    
llvmlite                  0.39.1          py310h58363a5_1    	llvmlite                  0.39.1          py310h58363a5_1    
lz4-c                     1.9.3                h9c3ff4c_1    	lz4-c                     1.9.3                h9c3ff4c_1    
lzo                       2.10              h516909a_1000    	lzo                       2.10              h516909a_1000    
magma                     2.5.4                hc72dce7_4    	magma                     2.5.4                hc72dce7_4    
mdtraj                    1.9.7           py310h902c554_4    	mdtraj                    1.9.7           py310h902c554_4    
mkl                       2022.1.0           hc2b9512_224     |	mkl                       2022.1.0           h84fe81f_915    
mpiplus                   v0.0.1          pyhd8ed1ab_1003    	mpiplus                   v0.0.1          pyhd8ed1ab_1003    
nccl                      2.14.3.1             h0800d71_0    	nccl                      2.14.3.1             h0800d71_0    
ncurses                   6.3                  h27087fc_1    	ncurses                   6.3                  h27087fc_1    
netcdf4                   1.6.1           nompi_py310h55e1e36	netcdf4                   1.6.1           nompi_py310h55e1e36
ninja                     1.11.0               h924138e_0    	ninja                     1.11.0               h924138e_0    
nnpops                    0.2             cuda112py310h85a0d1 |	nnpops                    0.2             cuda112py310h8b99da
nose                      1.3.7                   py_1006    	nose                      1.3.7                   py_1006    
numba                     0.56.3          py310ha5257ce_0    	numba                     0.56.3          py310ha5257ce_0    
numexpr                   2.7.3           py310hb5077e9_1     |	numexpr                   2.8.3           mkl_py310h0afd4a5_1
numpy                     1.23.4          py310h53a5b5f_1    	numpy                     1.23.4          py310h53a5b5f_1    
ocl-icd                   2.3.1                h7f98852_0    	ocl-icd                   2.3.1                h7f98852_0    
ocl-icd-system            1.0.0                         1    	ocl-icd-system            1.0.0                         1    
openmm                    8.0.0beta       py310h2996cf7_2    	openmm                    8.0.0beta       py310h2996cf7_2    
openmm-ml                 1.0beta            pyh79ba5db_2    	openmm-ml                 1.0beta            pyh79ba5db_2    
openmm-torch              1.0beta         cuda112py310hcc28b4 |	openmm-torch              1.0beta         cuda112py310h02d4f5
openmmtools               0.21.5             pyhd8ed1ab_0    	openmmtools               0.21.5             pyhd8ed1ab_0    
openssl                   3.0.7                h166bdaf_0    	openssl                   3.0.7                h166bdaf_0    
packaging                 21.3               pyhd8ed1ab_0    	packaging                 21.3               pyhd8ed1ab_0    
pandas                    1.5.1           py310h769672d_1    	pandas                    1.5.1           py310h769672d_1    
pdbfixer                  1.8.1              pyh6c4a22f_0    	pdbfixer                  1.8.1              pyh6c4a22f_0    
pip                       22.3.1             pyhd8ed1ab_0    	pip                       22.3.1             pyhd8ed1ab_0    
pluggy                    1.0.0              pyhd8ed1ab_5    	pluggy                    1.0.0              pyhd8ed1ab_5    
pycparser                 2.21               pyhd8ed1ab_0    	pycparser                 2.21               pyhd8ed1ab_0    
pymbar                    3.1.0           py310hde88566_1    	pymbar                    3.1.0           py310hde88566_1    
pyopenssl                 22.1.0             pyhd8ed1ab_0    	pyopenssl                 22.1.0             pyhd8ed1ab_0    
pyparsing                 3.0.9              pyhd8ed1ab_0    	pyparsing                 3.0.9              pyhd8ed1ab_0    
pysocks                   1.7.1              pyha2e5f31_6    	pysocks                   1.7.1              pyha2e5f31_6    
pytables                  3.7.0           py310hb60b9b2_3    	pytables                  3.7.0           py310hb60b9b2_3    
pytest                    7.2.0              pyhd8ed1ab_2    	pytest                    7.2.0              pyhd8ed1ab_2    
python                    3.10.6          ha86cf86_0_cpython 	python                    3.10.6          ha86cf86_0_cpython 
python-dateutil           2.8.2              pyhd8ed1ab_0    	python-dateutil           2.8.2              pyhd8ed1ab_0    
python_abi                3.10                    2_cp310    	python_abi                3.10                    2_cp310    
pytorch                   1.11.0          cuda112py310h51fe46	pytorch                   1.11.0          cuda112py310h51fe46
pytz                      2022.6             pyhd8ed1ab_0    	pytz                      2022.6             pyhd8ed1ab_0    
pyyaml                    6.0             py310h5764c6d_5    	pyyaml                    6.0             py310h5764c6d_5    
readline                  8.1.2                h0f457ee_0    	readline                  8.1.2                h0f457ee_0    
requests                  2.28.1             pyhd8ed1ab_1    	requests                  2.28.1             pyhd8ed1ab_1    
scipy                     1.9.3           py310hdfbd76f_2    	scipy                     1.9.3           py310hdfbd76f_2    
setuptools                59.5.0          py310hff52083_0    	setuptools                59.5.0          py310hff52083_0    
setuptools-scm            6.3.2              pyhd8ed1ab_0    	setuptools-scm            6.3.2              pyhd8ed1ab_0    
setuptools_scm            6.3.2                hd8ed1ab_0    	setuptools_scm            6.3.2                hd8ed1ab_0    
six                       1.16.0             pyh6c4a22f_0    	six                       1.16.0             pyh6c4a22f_0    
sleef                     3.5.1                h9b69904_2    	sleef                     3.5.1                h9b69904_2    
snappy                    1.1.9                hbd366e4_2    	snappy                    1.1.9                hbd366e4_2    
							      >	tbb                       2021.6.0             h924138e_1    
tk                        8.6.12               h27826a3_0    	tk                        8.6.12               h27826a3_0    
tomli                     2.0.1              pyhd8ed1ab_0    	tomli                     2.0.1              pyhd8ed1ab_0    
torchani                  2.2.2           cuda112py310h73d5bc |	torchani                  2.2.2           cuda112py310h98dee9
typing_extensions         4.4.0              pyha770c72_0    	typing_extensions         4.4.0              pyha770c72_0    
tzdata                    2022f                h191b570_0    	tzdata                    2022f                h191b570_0    
urllib3                   1.26.11            pyhd8ed1ab_0    	urllib3                   1.26.11            pyhd8ed1ab_0    
wheel                     0.38.3             pyhd8ed1ab_0    	wheel                     0.38.3             pyhd8ed1ab_0    
xz                        5.2.6                h166bdaf_0    	xz                        5.2.6                h166bdaf_0    
yaml                      0.2.5                h7f98852_2    	yaml                      0.2.5                h7f98852_2    
zipp                      3.10.0             pyhd8ed1ab_0    	zipp                      3.10.0             pyhd8ed1ab_0    
zlib                      1.2.13               h166bdaf_4    	zlib                      1.2.13               h166bdaf_4    
zstd                      1.5.2                h6239696_4    	zstd                      1.5.2                h6239696_4  

Note that the working mamba version does use pytorch 1.11.0

@sef43
Copy link

sef43 commented Nov 10, 2022

This is issue conda-forge/openmm-torch-feedstock#20

My conda installs NNPops, openmm-torch, and torchani packages that depend on pytorch=1.12 while installing pytorch=1.11.

e.g. conda installs:

openmm-torch 1.0beta cuda112py310h02d4f52_1
-------------------------------------------
file name   : openmm-torch-1.0beta-cuda112py310h02d4f52_1.tar.bz2
name        : openmm-torch
version     : 1.0beta
build       : cuda112py310h02d4f52_1
build number: 1
size        : 154 KB
license     : MIT
subdir      : linux-64
url         : https://conda.anaconda.org/conda-forge/label/openmm-torch_rc/linux-64/openmm-torch-1.0beta-cuda112py310h02d4f52_1.tar.bz2
md5         : ef057661bfc0d4412ce57721cf45fded
timestamp   : 2022-10-11 16:22:05 UTC
constraints : 
  - pytorch =*=cuda*
dependencies: 
  - __glibc >=2.17
  - cudatoolkit >=11.2,<12
  - libgcc-ng >=12
  - libstdcxx-ng >=12
  - ocl-icd >=2.3.1,<3.0a0
  - ocl-icd-system
  - openmm >=8.0.0beta,<8.1.0a0
  - python >=3.10,<3.11.0a0
  - python_abi 3.10.* *_cp310
  - pytorch >=1.12.0,<1.13.0a0

Mamba installs the versions that are built for pytorch=1.11 while installing pytorch=1.11

e.g.:

openmm-torch 1.0beta cuda112py310hcc28b43_1
-------------------------------------------
file name   : openmm-torch-1.0beta-cuda112py310hcc28b43_1.tar.bz2
name        : openmm-torch
version     : 1.0beta
build       : cuda112py310hcc28b43_1
build number: 1
size        : 153 KB
license     : MIT
subdir      : linux-64
url         : https://conda.anaconda.org/conda-forge/label/openmm-torch_rc/linux-64/openmm-torch-1.0beta-cuda112py310hcc28b43_1.tar.bz2
md5         : d0061658915c6de9eeff773d620cecc3
timestamp   : 2022-10-11 16:21:47 UTC
constraints : 
  - pytorch =*=cuda*
dependencies: 
  - __glibc >=2.17
  - cudatoolkit >=11.2,<12
  - libgcc-ng >=12
  - libstdcxx-ng >=12
  - ocl-icd >=2.3.1,<3.0a0
  - ocl-icd-system
  - openmm >=8.0.0beta,<8.1.0a0
  - python >=3.10,<3.11.0a0
  - python_abi 3.10.* *_cp310
  - pytorch >=1.11.0,<1.12.0a0

@raimis
Copy link
Contributor

raimis commented Nov 10, 2022

@sef43 good catch!

Still, I don't understand, why I get PyTorch 1.12.1 and you 1.11.0 with conda. Shouldn't it install the latest version available?

@raimis
Copy link
Contributor

raimis commented Nov 10, 2022

@FranklinHu1 could you install with mamba?

@sef43
Copy link

sef43 commented Nov 10, 2022

@sef43 good catch!

Still, I don't understand, why I get PyTorch 1.12.1 and you 1.11.0 with conda. Shouldn't it install the latest version available?

I have discovered the reason for this. I was using conda on a linux cluster head node which does not have a proper CUDA installation (CUDA is loaded when you run jobs on compute nodes)

The versions of pytorch-1.12.1-cuda112py310* from conda-forge all have:

dependencies: 
  - __cuda

So conda will not install it because the virtual dependency is not satisfied.
this can be overridden by setting the environmental variable to a version, e.g:
export CONDA_OVERRIDE_CUDA=11.6

There are version of pytorch-1.11.0-cuda112py310* which do not have the __cuda dependency, so they can be installed by conda.

@FranklinHu1
Copy link
Author

FranklinHu1 commented Nov 10, 2022

Yes, I have been able to run my debugging script by creating the openmm beta environment using mamba instead of conda as @sef43 suggested. I am now having problems loading my more advanced PyTorch models due to their dependency on packages like torch cluster and torch geometric.

For just torch cluster, I tried to install using conda using the command conda install -c conda-forge pytorch_cluster
inside my functioning openmm-8-beta-linux environment. However, this produced the following error message when I tried to do a simple import torch_cluster:

Traceback (most recent call last):
  File "/home/frankhu/openmm_water_MD/torchforce_example_script.py", line 9, in <module>
    import torch_cluster
  File "/home/frankhu/mambaforge/envs/openmm-8-beta-linux/lib/python3.10/site-packages/torch_cluster/__init__.py", line 18, in <module>
    torch.ops.load_library(spec.origin)
  File "/home/frankhu/mambaforge/envs/openmm-8-beta-linux/lib/python3.10/site-packages/torch/_ops.py", line 220, in load_library
    ctypes.CDLL(path)
  File "/home/frankhu/mambaforge/envs/openmm-8-beta-linux/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/frankhu/mambaforge/envs/openmm-8-beta-linux/lib/python3.10/site-packages/torch_cluster/_grid_cuda.so: undefined symbol: _ZN3c106detail19maybe_wrap_dim_slowEllb

Which is the same problem encountered with cluster in issue #87. In any case, I think my specific issue with the initial debugging script I posted is resolved with the mamba environment workaround, and I am now following issue #87 for the torch cluster and other torch dependencies fix.

@peastman
Copy link
Member

I still can't reproduce this, even using conda and PyTorch 1.12.1. Here's the sequence of commands I typed:

conda env create mmh/openmm-8-beta-linux
conda activate openmm-8-beta-linux
conda install -c conda-forge openmmtools cudatoolkit=11.6
python test.py

test.py contains the script in the first post above. It runs without problem.

environment
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                  2_kmp_llvm    conda-forge
astunparse                1.6.3              pyhd8ed1ab_0    conda-forge
attrs                     22.1.0             pyh71513ae_1    conda-forge
blosc                     1.21.1               h83bc5f7_3    conda-forge
brotlipy                  0.7.0           py310h5764c6d_1005    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2022.9.24            ha878542_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
certifi                   2022.9.24          pyhd8ed1ab_0    conda-forge
cffi                      1.15.1          py310h255011f_2    conda-forge
cftime                    1.6.2           py310hde88566_1    conda-forge
charset-normalizer        2.1.1              pyhd8ed1ab_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
cryptography              38.0.3          py310h600f1e7_0    conda-forge
cudatoolkit               11.6.0              hecad31d_10    conda-forge
cudnn                     8.4.1.50             hed8a83a_0    conda-forge
curl                      7.86.0               h2283fc2_1    conda-forge
exceptiongroup            1.0.4              pyhd8ed1ab_0    conda-forge
h5py                      3.7.0           nompi_py310h416281c_102    conda-forge
hdf4                      4.2.15               h9772cbc_5    conda-forge
hdf5                      1.12.2          nompi_h4df4325_100    conda-forge
icu                       70.1                 h27087fc_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
importlib-metadata        5.0.0              pyha770c72_1    conda-forge
importlib_metadata        5.0.0                hd8ed1ab_1    conda-forge
iniconfig                 1.1.1              pyh9f0ad1d_0    conda-forge
jpeg                      9e                   h166bdaf_2    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
krb5                      1.19.3               h08a2579_0    conda-forge
lark-parser               0.12.0             pyhd8ed1ab_0    conda-forge
ld_impl_linux-64          2.39                 hcc3a1bd_1    conda-forge
libblas                   3.9.0           16_linux64_openblas    conda-forge
libcblas                  3.9.0           16_linux64_openblas    conda-forge
libcurl                   7.86.0               h2283fc2_1    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 12.2.0              h65d4601_19    conda-forge
libgfortran-ng            12.2.0              h69a702a_19    conda-forge
libgfortran5              12.2.0              h337968e_19    conda-forge
libiconv                  1.17                 h166bdaf_0    conda-forge
liblapack                 3.9.0           16_linux64_openblas    conda-forge
libllvm11                 11.1.0               he0ac6c6_5    conda-forge
libnetcdf                 4.8.1           nompi_h261ec11_106    conda-forge
libnghttp2                1.47.0               hff17c54_1    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libopenblas               0.3.21          pthreads_h78a6416_3    conda-forge
libprotobuf               3.21.9               h6239696_0    conda-forge
libsqlite                 3.40.0               h753d276_0    conda-forge
libssh2                   1.10.0               hf14f497_3    conda-forge
libstdcxx-ng              12.2.0              h46fd767_19    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libxml2                   2.10.3               h7463322_0    conda-forge
libzip                    1.9.2                hc929e4a_1    conda-forge
libzlib                   1.2.13               h166bdaf_4    conda-forge
llvm-openmp               15.0.5               he0ac6c6_0    conda-forge
llvmlite                  0.39.1          py310h58363a5_1    conda-forge
lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
lzo                       2.10              h516909a_1000    conda-forge
magma                     2.5.4                hc72dce7_4    conda-forge
mdtraj                    1.9.7           py310h902c554_4    conda-forge
mkl                       2022.1.0           h84fe81f_915    conda-forge
mpiplus                   v0.0.1          pyhd8ed1ab_1003    conda-forge
nccl                      2.14.3.1             h0800d71_0    conda-forge
ncurses                   6.3                  h27087fc_1    conda-forge
netcdf4                   1.6.2           nompi_py310h55e1e36_100    conda-forge
ninja                     1.11.0               h924138e_0    conda-forge
nnpops                    0.2             cuda112py310h8b99da5_5    conda-forge
nose                      1.3.7                   py_1006    conda-forge
numba                     0.56.4          py310ha5257ce_0    conda-forge
numexpr                   2.7.3           py310hb5077e9_1    conda-forge
numpy                     1.23.5          py310h53a5b5f_0    conda-forge
ocl-icd                   2.3.1                h7f98852_0    conda-forge
ocl-icd-system            1.0.0                         1    conda-forge
openmm                    8.0.0beta       py310h2996cf7_2    conda-forge/label/openmm_rc
openmm-ml                 1.0beta            pyh79ba5db_2    conda-forge/label/openmm_rc
openmm-torch              1.0beta         cuda112py310h02d4f52_1    conda-forge/label/openmm-torch_rc
openmmtools               0.21.5             pyhd8ed1ab_0    conda-forge
openssl                   3.0.7                h166bdaf_0    conda-forge
packaging                 21.3               pyhd8ed1ab_0    conda-forge
pandas                    1.5.2           py310h769672d_0    conda-forge
pdbfixer                  1.8.1              pyh6c4a22f_0    conda-forge
pip                       22.3.1             pyhd8ed1ab_0    conda-forge
pluggy                    1.0.0              pyhd8ed1ab_5    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pymbar                    3.1.0           py310hde88566_1    conda-forge
pyopenssl                 22.1.0             pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
pytables                  3.7.0           py310hb60b9b2_3    conda-forge
pytest                    7.2.0              pyhd8ed1ab_2    conda-forge
python                    3.10.8          h4a9ceb5_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.10                    3_cp310    conda-forge
pytorch                   1.12.1          cuda112py310he33e0d6_201    conda-forge
pytz                      2022.6             pyhd8ed1ab_0    conda-forge
pyyaml                    6.0             py310h5764c6d_5    conda-forge
readline                  8.1.2                h0f457ee_0    conda-forge
requests                  2.28.1             pyhd8ed1ab_1    conda-forge
scipy                     1.9.3           py310hdfbd76f_2    conda-forge
setuptools                59.5.0          py310hff52083_0    conda-forge
setuptools-scm            6.3.2              pyhd8ed1ab_0    conda-forge
setuptools_scm            6.3.2                hd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
sleef                     3.5.1                h9b69904_2    conda-forge
snappy                    1.1.9                hbd366e4_2    conda-forge
tbb                       2021.7.0             h924138e_0    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
torchani                  2.2.2           cuda112py310h98dee98_6    conda-forge
typing_extensions         4.4.0              pyha770c72_0    conda-forge
tzdata                    2022f                h191b570_0    conda-forge
urllib3                   1.26.11            pyhd8ed1ab_0    conda-forge
wheel                     0.38.4             pyhd8ed1ab_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
zipp                      3.10.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               h166bdaf_4    conda-forge
zstd                      1.5.2                h6239696_4    conda-forge

@sef43
Copy link

sef43 commented Nov 24, 2022

I still can't reproduce this, even using conda and PyTorch 1.12.1. Here's the sequence of commands I typed:

conda env create mmh/openmm-8-beta-linux
conda activate openmm-8-beta-linux
conda install -c conda-forge openmmtools cudatoolkit=11.6
python test.py

test.py contains the script in the first post above. It runs without problem.

environment

@peastman The error seems occur when you use conda in a linux environment which does not have CUDA available. Conda can detect if you have CUDA available, if you look at the output of

conda info

and look at the virtual packages line, i.e. for me on a linux node with CUDA:

    ...
       virtual packages : __cuda=11.6=0
                          __linux=4.15.0=0
                          __glibc=2.27=0
                          __unix=0=0
                          __archspec=1=x86_64
    ...

If i run your commands in this linux environment it works.

However, when on a Linux node which does not have CUDA available and the output of conda info does not have a CUDA virtual package:

    ...
       virtual packages : __linux=4.15.0=0
                          __glibc=2.27=0
                          __unix=0=0
                          __archspec=1=x86_64
    ...

Then when I run the commands I get the segmentation fault (because conda installs the incompatible pytorch).

You should be able to simulate this buy overriding the conda virtual package (see https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-virtual.html):

export CONDA_OVERRIDE_CUDA=""

I can reproduce the segmentation fault, on a Linux node which has CUDA, with these lines:

export CONDA_OVERRIDE_CUDA=""
conda env create mmh/openmm-8-beta-linux
conda activate openmm-8-beta-linux
conda install -c conda-forge openmmtools cudatoolkit=11.6
python test.py

Output:

Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
removing force
removing force
removing force
removing force
removing force
Segmentation fault (core dumped)

@peastman
Copy link
Member

Thanks, I can reproduce it with that sequence of commands.

I compared environments created with mamba (which works) and conda (which fails). They install identical builds of pytorch, so that isn't the problem. But they install different builds of openmm-torch. Conda installs cuda112py310h02d4f52_1 and mamba installs cuda112py310hcc28b43_1. Looking through the list of files, you can see there are two packages for every combination of OS, Python version, and CUDA version. That's because the recipe tells it to build for both PyTorch 1.11 and 1.12:

pytorch:
  - 1.11.0
  - 1.12.0

I'm not sure how to determine which of the two packages is which, but presumably this means conda is incorrectly installing the wrong one for the PyTorch version it has installed?

@peastman
Copy link
Member

@mikemhenry do you have any idea what might be causing the behavior described above (#88 (comment))? Is this a bug in conda, or is there a problem in how we specify the constraints in the recipe?

@sef43
Copy link

sef43 commented Dec 2, 2022

My suspicion is that the behaviour is due to the fact that all the pytorch=*=cuda* builds on conda-forge uploaded since this change all have the dependency on "__cuda". This will not be satisfied if you are running on Linux without cuda.

You can view this by running:
conda search -c conda-forge pytorch=1.12.1=cuda* -i

example output
pytorch 1.12.1 cuda112py39hb0b7ed5_201
--------------------------------------
file name   : pytorch-1.12.1-cuda112py39hb0b7ed5_201.tar.bz2
name        : pytorch
version     : 1.12.1
build       : cuda112py39hb0b7ed5_201
build number: 201
size        : 490.1 MB
license     : BSD-3-Clause
subdir      : linux-64
url         : https://conda.anaconda.org/conda-forge/linux-64/pytorch-1.12.1-cuda112py39hb0b7ed5_201.tar.bz2
md5         : 0775149d450c7a7d776595987ca8f171
timestamp   : 2022-09-28 23:44:25 UTC
constraints : 
  - pytorch-cpu = 99999999
  - pytorch-gpu = 1.12.1
dependencies: 
  - __cuda
  - __glibc >=2.17
  - __glibc >=2.17,<3.0.a0
  - _openmp_mutex >=4.5
  - cffi
  - cudatoolkit >=11.2,<12
  - cudnn >=8.4.1.50,<9.0a0
  - libcblas >=3.9.0,<4.0a0
  - libgcc-ng >=12
  - libprotobuf >=3.21.6,<3.22.0a0
  - libstdcxx-ng >=12
  - magma >=2.5.4,<2.5.5.0a0
  - mkl >=2022.1.0,<2023.0a0
  - nccl >=2.14.3.1,<3.0a0
  - ninja
  - numpy >=1.20.3,<2.0a0
  - python >=3.9,<3.10.0a0
  - python_abi 3.9.* *_cp39
  - setuptools
  - sleef >=3.5.1,<4.0a0
  - typing_extensions

Conda will not be able to install them unless it detects you have cuda on your system. However, the older uploads of pytorch=1.11=cuda* do not have this dependency. Conda, but not Mamba, seem to struggle with this. There are no linux CPU only builds of openmm-torch on conda-forge.

I believe the fix is to either provide a linux CPU only build of openmm-torch which depends on pytorch=*=cpu* with the constraint pytorch=*=cpu* or tell users they will need to check the output of conda info, and if if necessary use export CONDA_OVERRIDE_CUDA="11.X" to target the compute node CUDA version they will be using.

@peastman
Copy link
Member

peastman commented Dec 2, 2022

That explains why it installs pytorch 1.11 instead of 1.12. But in that case, it ought to install the openmm-torch package that was built against pytorch 1.11.

I don't really understand the constraints in the recipe. In the build: section it lists

    - pytorch                                # [build_platform != target_platform]
    - pytorch =*={{ torch_proc_type }}*      # [build_platform != target_platform]

Those are only used when cross compiling, so I don't think they're relevant. Then they're specified under host:

    # Leaving two dependencies helps rerender correctly
    # The first gets filled in by the global pinnings
    # The second gets the processor type
    - pytorch
    - pytorch =*={{ torch_proc_type }}*

And finally it includes this constraint.

  run_constrained:
    # 2022/02/05 hmaarrfk
    # While conda packaging seems to allow us to specify
    # constraints on the same package in different lines
    # the resulting package doesn't have the ability to
    # be specified in multiples lines
    # This makes it tricky to use run_exports
    # we add the GPU constraint in the run_constrained
    # to allow us to have "two" constraints on the
    # running package
    - pytorch =*={{ torch_proc_type }}*

@peastman
Copy link
Member

peastman commented Dec 6, 2022

@hmaarrfk since your name is mentioned in the comment above, I wondered if you had any idea about this issue? The short version is that openmm-torch gets built against two pytorch versions, 1.11 and 1.12. When installing with conda on a computer that doesn't support CUDA, it installs pytorch 1.11 but the openmm-torch package that was built against 1.12. That leads to a segfault.

On the other hand, if you install with mamba, it correctly installs the package that was built against 1.11 and therefore works. (Either way, it never installs pytorch 1.12. It seems the newer packages aren't supported on computers without CUDA?)

@hmaarrfk
Copy link

hmaarrfk commented Dec 7, 2022

I think i used a "feature" or a happy mistake of mamba, and not conda. Conda seems to ignore the constraint leading to incompatible versions being installed.

I'm not sure of an other syntax that would work with the migration pipeline of ours. You are free to try something.

You can also try to use conda-libmamaba-solver.

One thing that may have changed is,:

We've been somewhat convinced that we should use higher build numbers to help prioritize cuda builds for machines that support them. Given this we can probably adjust the run export to pin GPU builds to the GPU, and CPU to CPU or GPU. Given the constraints from the overall environment, it is likely that the versions will be correctly installed

@peastman
Copy link
Member

peastman commented Dec 7, 2022

Thanks! That's helpful information.

You can also try to use conda-libmamaba-solver.

Is it possible to include that in an environment file so it would automatically be used when building an environment? Or would we need to tell users to install it first? If the latter, it's probably simpler to just tell them to use mamba.

I'm trying to dig into the built packages to understand better what is happening. Here is the index.json file for the package installed by conda. It's the one that was built against pytorch 1.12, but incorrectly gets installed with 1.11.

{
  "arch": "x86_64",
  "build": "cuda112py310h02d4f52_1",
  "build_number": 1,
  "constrains": [
    "pytorch =*=cuda*"
  ],
  "depends": [
    "__glibc >=2.17",
    "cudatoolkit >=11.2,<12",
    "libgcc-ng >=12",
    "libstdcxx-ng >=12",
    "ocl-icd >=2.3.1,<3.0a0",
    "ocl-icd-system",
    "openmm >=8.0.0beta,<8.1.0a0",
    "python >=3.10,<3.11.0a0",
    "python_abi 3.10.* *_cp310",
    "pytorch >=1.12.0,<1.13.0a0"
  ],
  "license": "MIT",
  "license_family": "MIT",
  "name": "openmm-torch",
  "platform": "linux",
  "subdir": "linux-64",
  "timestamp": 1665505325676,
  "version": "1.0beta"
}

There are a few things I notice about this. First, the constrains section just specifies pytorch =*=cuda*. There's no version mentioned. On the other hand, constrains is not one of the keys mentioned in the documentation. So is this expected to do anything at all, or is it just there for information?

On the other hand, the depends section specifies pytorch >=1.12.0,<1.13.0a0. That definitely ought to require pytorch 1.12, so how is conda installing it with 1.11?

Given this we can probably adjust the run export to pin GPU builds to the GPU, and CPU to CPU or GPU.

Can you explain how that would work? I've never been completely clear on the relationship between the pytorch and pytorch-cpu packages. And I see that 1.13 has added a new pytorch-gpu package. The ideal behavior is that it should always install a version of pytorch that's capable of using CUDA, even if there isn't a compatible GPU or driver on that computer. It's pretty common for clusters to have GPUs on the compute nodes, but not on the login nodes people use to install software.

@hmaarrfk
Copy link

hmaarrfk commented Dec 7, 2022

I really don't like pytorch-gpu as a name. I think it makes writing recipes harder.

Constraints were added in conda build conda/conda-build#2001

@peastman
Copy link
Member

peastman commented Dec 7, 2022

Got it, thanks. So that leaves two questions.

  1. Should the constrains section specify a version for pytorch?
  2. Why doesn't pytorch >=1.12.0,<1.13.0a0 under depends prevent conda from installing it with pytorch 1.11?

Perhaps conda doesn't expect the same package to be listed in both places? Since it is, the specification in constrains (which doesn't specify a version) takes precedence over the one in depends (which does)?

@hmaarrfk
Copy link

hmaarrfk commented Dec 7, 2022

  1. the constraints, and the run requirements should intersect and give you the exact version you need.
  2. it should. that was my theory. it seems that conda ignores the run requirement in depends and only uses the constraint.

The thing is that the depends line comes from the pytorch package itself. It is in charge of deciding what versions it is compatible with. However, i don't know a asyntax to say version AND build string requirement. So i tried to use the constraint to add the build string requirement.

@peastman
Copy link
Member

peastman commented Dec 8, 2022

What would happen if we just left out the run_constrained line? In that case it would always install the one for the right version of pytorch, but it wouldn't necessarily pick the gpu vs. cpu correctly, is that right? In that case, how would it decide whether or not to install a GPU enabled build of pytorch?

@mikemhenry
Copy link
Collaborator

Thanks @hmaarrfk for helping out here!

@hmaarrfk
Copy link

hmaarrfk commented Dec 8, 2022

What would happen if we just left out the run_constrained line? In that case it would always install the one for the right version of pytorch, but it wouldn't necessarily pick the gpu vs. cpu correctly, is that right?

Correct.

In that case, how would it decide whether or not to install a GPU enabled build of pytorch?

Theoretically, the higher build number of recent pytorch versions should conda and mamba to prefer the GPU builds, if possible.

Honestly, my recommendation would be:

  1. include minimum constraints in your environment files. pytorch >=1.12.0 is a good one.
  2. Recommend, or nudge users toward mamba. Its faster (better user experience) and avoids this problem.

We can also try to export the GPU requirement at build time. This would require updated requirements. What I was scared of, is that if you build locally, then it may:

  1. Pull in a GPU package for a "CPU only requirement".
  2. Make GPU required for that package, even though it would work woith pytorch-cpu.

I think this case might be small, but i think it would be very confusing to debug.

@mikemhenry
Copy link
Collaborator

@peastman I can update the environment files, I can also grep our docs and switch recommendations to use mamba.

I can also do some builds with the second half of suggestions, but I do worry about making things harder to debug.

@peastman
Copy link
Member

peastman commented Dec 8, 2022

include minimum constraints in your environment files. pytorch >=1.12.0 is a good one.

We'll still have to build against multiple versions of pytorch. We build native libraries that link to libtorch, which isn't binary compatible across major releases.

Recommend, or nudge users toward mamba. Its faster (better user experience) and avoids this problem.

Definitely!

It sounds like removing the run_constrained ought to be a reasonable solution for us. Having it automatically prefer GPU builds is exactly what we want. At worst it means a larger download. And even if they're installing on a computer without CUDA, that may just mean it's the login node of a cluster and they'll still need GPU support later.

@mikemhenry what do you think?

@mikemhenry
Copy link
Collaborator

@peastman Sounds good! I will get a new build out the door asap!

@mikemhenry
Copy link
Collaborator

PR here: conda-forge/openmm-torch-feedstock#30

@mikemhenry
Copy link
Collaborator

Adding pytorch 1.13 here conda-forge/openmm-torch-feedstock#31

@peastman
Copy link
Member

peastman commented Jan 6, 2023

Merging conda-forge/openmm-torch-feedstock#31 appears to have fixed the problem. After installing with conda as described above, I can now run the script without it crashing.

@sef43
Copy link

sef43 commented Jan 17, 2023

Yes also fixed for me

@peastman
Copy link
Member

Thanks! I'll close this then.

@sef43 sef43 mentioned this issue Apr 11, 2023
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

6 participants