You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm having some issues with CUDA while running LAMMPS with Allegro. For training the model and deploying it into a .pth file, I used 4 GPUs (gres=gpu:4), 1 node and 20 cores. It ran successfully and I obtained a .pth file.
While using the same GPU and node specifications for running LAMMPS with the .pth file, I'm getting the following error message :
.
.
.
reading velocities ...
3916 velocities
read_data CPU = 0.068 seconds
ERROR (Allegro): my rank (4) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (13) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (15) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (16) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (18) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (19) is bigger than the number of visible devices (4)!!!Allegro is using device cuda:0
Allegro is using device cuda:1
Allegro is using device cuda:2
Allegro is using device cuda:3
ERROR (Allegro): my rank (6) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (7) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (8) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (9) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (10) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (11) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (12) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (14) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (17) is bigger than the number of visible devices (4)!!!Allegro: Loading model from water.pth
Allegro: Loading model from water.pth
Allegro: Loading model from water.pth
Allegro: Loading model from water.pth
ERROR (Allegro): my rank (5) is bigger than the number of visible devices (4)!!!Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
.
.
.
How do I fix this?
Thanks!
Kuntal
P.S - I am very new to using Allegro and GPUs in general, so pardon my rookie mistakes please!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi,
I'm having some issues with CUDA while running LAMMPS with Allegro. For training the model and deploying it into a .pth file, I used 4 GPUs (gres=gpu:4), 1 node and 20 cores. It ran successfully and I obtained a .pth file.
While using the same GPU and node specifications for running LAMMPS with the .pth file, I'm getting the following error message :
.
.
.
reading velocities ...
3916 velocities
read_data CPU = 0.068 seconds
ERROR (Allegro): my rank (4) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (13) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (15) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (16) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (18) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (19) is bigger than the number of visible devices (4)!!!Allegro is using device cuda:0
Allegro is using device cuda:1
Allegro is using device cuda:2
Allegro is using device cuda:3
ERROR (Allegro): my rank (6) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (7) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (8) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (9) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (10) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (11) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (12) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (14) is bigger than the number of visible devices (4)!!!ERROR (Allegro): my rank (17) is bigger than the number of visible devices (4)!!!Allegro: Loading model from water.pth
Allegro: Loading model from water.pth
Allegro: Loading model from water.pth
Allegro: Loading model from water.pth
ERROR (Allegro): my rank (5) is bigger than the number of visible devices (4)!!!Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Allegro: Freezing TorchScript model...
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
.
.
.
How do I fix this?
Thanks!
Kuntal
P.S - I am very new to using Allegro and GPUs in general, so pardon my rookie mistakes please!
Beta Was this translation helpful? Give feedback.
All reactions