-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow speed with torchANI simulations on the GPU #35
Comments
A significant part of the computation overhead is coming form libTorch, which is more or less constant regardless the GPU. |
The zip file you posted is empty. There are no files in it. Are you using ANI for the entire system, or just the butane? If the latter, I'm not surprised there's little difference. The Ampere GPU is much wider (many more compute units) than the Maxwell one. With that few atoms, it probably isn't able to fill the whole GPU. And as @raimis points out, the speed will largely be dominated by overhead. If you're using ANI for the whole system, though, I would have expected a bigger difference. |
I see, so this is the rate-limiting step in the calculation.
Ah my bad. I've updated the link above in my earlier comment to the correct zip file.
I'm using ANI just for the butane molecule but perhaps I can run the same benchmark using ANI for the whole system. |
hm, so i took a stab at profiling cpu vs gpu integration time with a |
Interestingly, if i use the
(all times are in seconds) It seems like just getting the energies/forces from the |
How did you measure the time? Did you do something like repeatedly calling I would measure the speed by running a lot of steps at once. Something like this: context.getState(getEnergy=True)
time1 = datatime.now()
integrator.step(1000)
context.getState(getEnergy=True)
time2 = datatime.now() Querying the energy forces it to finish all work that has been queued up to that point. So we first empty out the queue, then take a lot of steps, then again query the energy to force it to finish all the steps. |
We just made a tutorial how to speed up TorchANI with NNPOps: https://github.com/openmm/openmm-torch/blob/master/tutorials/openmm-torch-nnpops.ipynb |
Sorry if this isn't the most relevant question to ask here, but I'm curious about this:
Could you elaborate on how libTorch is contributing to the computational cost (apart from the cost of the model itself, of course)? |
I think he was referring to kernel launch overhead and host-device data transfers. They're both mostly independent of the GPU. We've done work recently to try to reduce them by using CUDA graphs. |
Hi, I benchmarked a simple system with GAFF and
torchANI
potentials on two different GPUs: GTX Titan X (Maxwell) and RTX 3090 (Ampere). For GAFF, I'm getting a 2x speed on the 3090 compared to Titan X (see table below). However, the two GPUs achieved very similar speeds when running with thetorchANI
potential. Is this something I should expect for simulations withtorchANI
potential oropenmm-torch
in general? Could the molecule be too small to utilize the GPU resources?My test system is a butane molecule in a water box (1000 water molecules) with the GAFF force field. I minimized and equilibrated the system first with GAFF before running simulations with the
torchANI
potential. butane-benchmark.zipThe text was updated successfully, but these errors were encountered: