[Bug]: test_vmap
fails on multi-node runs on hardware accelerators
#1627
Labels
test_vmap
fails on multi-node runs on hardware accelerators
#1627
What happened?
When running on more than one node and using GPUs at the same time,
test_vmap
fails. Needs further investigation.Code snippet triggering the error
When running the test on Horeka using accelerated nodes, the test fails when running the test on 2 Nodes, with 3 or 4 ranks each.
HEAT_TEST_USE_DEVICE=gpu mpirun --report-bindings -N 3/4 pytest heat/core/tests/test_vmap.py
Error message or erroneous outcome
The result of the test does not match the expected outcome.
FAILED heat/core/tests/test_vmap.py::TestVmap::test_vmap - AssertionError: False is not true
Version
main (development branch)
Python version
3.11.2
PyTorch version
2.2.2
Cuda version
12.2
MPI version
The text was updated successfully, but these errors were encountered: