Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with AMD GPUs #111

Open
svandenhaute opened this issue Jul 11, 2023 · 3 comments
Open

Compatibility with AMD GPUs #111

svandenhaute opened this issue Jul 11, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@svandenhaute
Copy link

Does this plugin work with AMD GPUs? PyTorch+ROCm is not installable via conda...

@peastman
Copy link
Member

It includes an OpenCL implementation which works with AMD GPUs. However, I believe the pytorch model will get executed on the CPU and only the OpenMM calculations will run on the GPU.

@svandenhaute
Copy link
Author

svandenhaute commented Jul 11, 2023

Right, but given that the pytorch evaluation constitutes most of the calculation time (at least for relatively small systems and complex torch models), this would turn out to be rather slow, no?

EDIT: PyTorch+ROCm is easily installable via pip. Would there be an easy way to patch things together?

@peastman
Copy link
Member

PyTorch+ROCm is easily installable via pip. Would there be an easy way to patch things together?

There's two parts to that question. First, could PyTorch installed with pip and OpenMM installed with conda be made to work together? Possibly. It depends on whether they were compiled in ways that make them binary compatible. If not, you could always compile OpenMM from source.

The second part is what it would take to make TorchForce work with PyTorch using HIP. Take a look at the CUDA implementation in https://github.com/openmm/openmm-torch/blob/master/platforms/cuda/src/CudaTorchKernels.cpp for a sense of what it involves. It needs to move the model to correct device, and also ensure that any tensors it creates are on the correct device.

const torch::Device device(torch::kCUDA, cu.getDeviceIndex()); // This implicitly initialize PyTorch
module.to(device);
torch::TensorOptions options = torch::TensorOptions().device(device).dtype(cu.getUseDoublePrecision() ? torch::kFloat64 : torch::kFloat32);

Then there's the fact that all the data for the tensors is stored on the GPU, so all accesses to it have to happen in the correct way. You could avoid that complexity by just bringing all data back to the CPU when communicating between PyTorch and OpenMM, though that would increase overhead.

Finally there's some bookkeeping needed to keep everything working properly, such as calls to cuDevicePrimaryCtxRetain() and cuDevicePrimaryCtxRelease(). HIP is modelled after CUDA, so converting everything probably wouldn't be hard.

Of course, we couldn't distribute it through conda-forge because it doesn't support HIP. :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants