Slow speed with torchANI simulations on the GPU #35

jeff231li · 2021-05-14T05:24:19Z

Hi, I benchmarked a simple system with GAFF and torchANI potentials on two different GPUs: GTX Titan X (Maxwell) and RTX 3090 (Ampere). For GAFF, I'm getting a 2x speed on the 3090 compared to Titan X (see table below). However, the two GPUs achieved very similar speeds when running with the torchANI potential. Is this something I should expect for simulations with torchANI potential or openmm-torch in general? Could the molecule be too small to utilize the GPU resources?

GPU	GAFF (ns/day)	torchANI (ns/day)
GTX Titan X	490	5.9
RTX 3090	970	7.7

My test system is a butane molecule in a water box (1000 water molecules) with the GAFF force field. I minimized and equilibrated the system first with GAFF before running simulations with the torchANI potential. butane-benchmark.zip

The text was updated successfully, but these errors were encountered:

raimis · 2021-05-14T09:57:36Z

A significant part of the computation overhead is coming form libTorch, which is more or less constant regardless the GPU.

peastman · 2021-05-14T16:36:58Z

The zip file you posted is empty. There are no files in it.

Are you using ANI for the entire system, or just the butane? If the latter, I'm not surprised there's little difference. The Ampere GPU is much wider (many more compute units) than the Maxwell one. With that few atoms, it probably isn't able to fill the whole GPU. And as @raimis points out, the speed will largely be dominated by overhead. If you're using ANI for the whole system, though, I would have expected a bigger difference.

jeff231li · 2021-05-14T16:59:37Z

A significant part of the computation overhead is coming form libTorch, which is more or less constant regardless the GPU.

I see, so this is the rate-limiting step in the calculation.

The zip file you posted is empty. There are no files in it.

Ah my bad. I've updated the link above in my earlier comment to the correct zip file.

Are you using ANI for the entire system, or just the butane? ... If you're using ANI for the whole system, though, I would have expected a bigger difference.

I'm using ANI just for the butane molecule but perhaps I can run the same benchmark using ANI for the whole system.

dominicrufa · 2021-06-04T00:57:05Z

hm, so i took a stab at profiling cpu vs gpu integration time with a TorchForce equipped/unequipped. I took the HostGuestExplicit, treated the guest molecule (30 atoms) with the TorchANI 2x module, ran 100 md steps on gpu/cpu (labelled ml). as a control, I ran the system again on gpu/cpu without the TorchANI force (labelled mm)

dominicrufa · 2021-06-04T00:58:41Z

@peastman , @raimis , i was surprised by the variance in the TorchANI-equipped gpu simulation compared to the other three integration time profiles. is there a speculation as to why this may be the case?

dominicrufa · 2021-06-04T01:00:59Z

Interestingly, if i use the context.getState and pull the energy/forces on the GPU, i see that:

the energy evaluation time of the non-torch forces is: 0.0001876354217529297
the force evaluation time of the non-torch forces is: 0.002471446990966797
the energy evaluation time of the torch forces is: 0.0010063648223876953
the force evaluation time of the torch forces is: 0.003879547119140625

(all times are in seconds)

It seems like just getting the energies/forces from the TorchForce via getState is really cheap compared to what happens in the integration scheme?

peastman · 2021-06-04T16:34:18Z

How did you measure the time? Did you do something like repeatedly calling step(1) and checking the time after each call? If so, you're probably seeing an artifact of the measurement process. Launching kernels on the GPU is asynchronous. It just adds the kernel to a queue and returns immediately. step() will usually return before it has actually finished computing the step.

I would measure the speed by running a lot of steps at once. Something like this:

context.getState(getEnergy=True)
time1 = datatime.now()
integrator.step(1000)
context.getState(getEnergy=True)
time2 = datatime.now()

Querying the energy forces it to finish all work that has been queued up to that point. So we first empty out the queue, then take a lot of steps, then again query the energy to force it to finish all the steps.

raimis · 2022-03-01T14:05:45Z

We just made a tutorial how to speed up TorchANI with NNPOps: https://github.com/openmm/openmm-torch/blob/master/tutorials/openmm-torch-nnpops.ipynb

JustinAiras · 2023-07-05T20:36:47Z

Sorry if this isn't the most relevant question to ask here, but I'm curious about this:

A significant part of the computation overhead is coming form libTorch, which is more or less constant regardless the GPU.

Could you elaborate on how libTorch is contributing to the computational cost (apart from the cost of the model itself, of course)?

peastman · 2023-07-05T20:40:05Z

I think he was referring to kernel launch overhead and host-device data transfers. They're both mostly independent of the GPU. We've done work recently to try to reduce them by using CUDA graphs.

raimis added the help wanted Extra attention is needed label Mar 1, 2022

dmighty007 mentioned this issue Feb 11, 2024

OpenMMException when adding TorchForce to the system #134

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow speed with torchANI simulations on the GPU #35

Slow speed with torchANI simulations on the GPU #35

jeff231li commented May 14, 2021 •

edited

Loading

raimis commented May 14, 2021

peastman commented May 14, 2021

jeff231li commented May 14, 2021

dominicrufa commented Jun 4, 2021

dominicrufa commented Jun 4, 2021

dominicrufa commented Jun 4, 2021 •

edited

Loading

peastman commented Jun 4, 2021

raimis commented Mar 1, 2022

JustinAiras commented Jul 5, 2023

peastman commented Jul 5, 2023

Slow speed with torchANI simulations on the GPU #35

Slow speed with torchANI simulations on the GPU #35

Comments

jeff231li commented May 14, 2021 • edited Loading

raimis commented May 14, 2021

peastman commented May 14, 2021

jeff231li commented May 14, 2021

dominicrufa commented Jun 4, 2021

dominicrufa commented Jun 4, 2021

dominicrufa commented Jun 4, 2021 • edited Loading

peastman commented Jun 4, 2021

raimis commented Mar 1, 2022

JustinAiras commented Jul 5, 2023

peastman commented Jul 5, 2023

jeff231li commented May 14, 2021 •

edited

Loading

dominicrufa commented Jun 4, 2021 •

edited

Loading