Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation #116

Open
BJWiley233 opened this issue Mar 2, 2024 · 5 comments
Open

Compilation #116

BJWiley233 opened this issue Mar 2, 2024 · 5 comments

Comments

@BJWiley233
Copy link

It is required to have to build nnpops with version 11.* of cudatoolkit and a 10.3 gxx compile. These are going to be outdated. When I try to compile with own gnu gcc/g++ version 11.4 and Cuda 12.3 I get this error:

CMake Error at /home/coyote/miniconda3/envs/nnpops/share/cmake/Caffe2/Caffe2Targets.cmake:144 (message):
  The imported target "c10_cuda" references the file

     "/home/coyote/miniconda3/envs/nnpops/lib/libc10_cuda.so"

  but this file does not exist.  Possible reasons include:

  * The file was deleted, renamed, or moved to another location.

  * An install or uninstall procedure did not complete successfully.

  * The installation package was faulty and contained

     "/home/coyote/miniconda3/envs/nnpops/share/cmake/Caffe2/Caffe2Targets.cmake"

  but not all the files it references.

Call Stack (most recent call first):
  /home/coyote/miniconda3/envs/nnpops/share/cmake/Caffe2/Caffe2Config.cmake:113 (include)
  /home/coyote/miniconda3/envs/nnpops/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
  CMakeLists.txt:13 (find_package)
@peastman
Copy link
Member

peastman commented Mar 2, 2024

You may have some sort of conflict in your environment. Possibly you have an incompatible version of PyTorch installed? There are conda packages for CUDA 12, so it definitely can compile.

We should update the environment.yml file in this repository. Replace cudatoolkit with cuda-version, and probably specify a newer PyTorch.

@BJWiley233
Copy link
Author

BJWiley233 commented Mar 2, 2024

THanks,

I got it to install with gxx_linux-64 11.3.0, pytorch-gpu from channel pytorch and general cudatoolkit. There are some issues (gaps) I am seeing now between @jharrymoore/openmmtools (MACE) and openmm Simulations which require a Platform with platformProperties that I will eventually submit a PR for.

I'll try later with cuda12, however I was hoping to just use my base CUDA since I am installing on PCs.

@RaulPPelaez
Copy link
Contributor

The cudatoolkit package does not include nvcc. The conda-forge "nvcc" is just a meta package that links to your system nvcc. This can easily get out of sync.
I agree we should update the env file with the new conda-forge CUDA packages. This would make it only for CUDA>=12, but I think that is ok. The nvidia channel can be used for previous versions if need be.

@cagrikymk
Copy link

I am having an issue with the compilation with CUDA 12.4.
The error I get:
/python3.10/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:70 (message): Failed to find nvToolsExt Call Stack (most recent call first):

This looks related to https://discuss.pytorch.org/t/failed-to-find-nvtoolsext/179635
When I install it using conda, it somehow tried to installing the CPU version.
This is the pytorch version I have:
pytorch 2.4.1 py3.10_cuda12.4_cudnn9.1.0_0
pytorch-cuda 12.4 hc786d27_6

When I enforce the version I want, this is the error I get:

conda install -c conda-forge nnpops=0.6=cuda120py310h3ec4162_11

Channels:
 - conda-forge
 - defaults
 - nvidia
 - pytorch
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: failed

LibMambaUnsatisfiableError: Encountered problems while solving:
  - nothing provides __cuda needed by pytorch-2.4.0-cuda118_py310h954aa82_300

Could not solve for environment specs
The following package could not be installed
└─ nnpops ==0.6 cuda120py310h3ec4162_11 is installable and it requires
   └─ pytorch [* cuda*|>=2.4.0,<2.5.0a0 ] with the potential options
      ├─ pytorch [2.4.0|2.4.1], which can be installed;
      ├─ pytorch [2.4.0|2.4.1] would require
      │  └─ __cuda, which is missing on the system;
      └─ pytorch * conflicts with any installable versions previously reported.

I am looking for ways to get this to working with what I have. Is there a way to achieve that or do I need to downgrade?

@cagrikymk
Copy link

I resolved the issue.

In case anyone has a similar problem, I specified the CUDA and nvcc path while running cmake:
cmake .. -DTorch_DIR=$(python -c 'import torch.utils; print(torch.utils.cmake_prefix_path)')/Torch -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX -DCUDA_TOOLKIT_ROOT_DIR=$CUDA_HOME -DCMAKE_CUDA_COMPILER=$CUDA_HOME/bin/nvcc

Also, I am using Pytorch 2.5 and I had to upgrade CXX_STANDARD and CUDA_STANDARD to 17 in CMakeLists.txt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants