[Bug]: bug (or feature (?)) in gather
when gathering from several GPU-devices
#1171
Labels
gather
when gathering from several GPU-devices
#1171
What happened?
When performing
gather
-operations on torch tensors on several GPUs, the tensors in the resulting list of tensors are stil on different devices (though on the same MPI-rank, surprisingly); see the below example with 8 MPI-processes on 2 nodes with 4 GPUs each.This is actually the problem that causes print to fail when communication over GPU is required (see #1121, #1170): at the end of
_torch_data
the local data are gathered from all MPI-ranks and concatenation viatorch.cat
fails because the gathered local arrays are still on different devices...Code snippet triggering the error
run with
I would have expected that all tensors in the list on process 0 are on device
'cuda:0'
of node 1 (since --at least in my impression-- this is the device where MPI-process 0 should "live" on...).The text was updated successfully, but these errors were encountered: