You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to use multiple GPU-s to train with 160000 images.
There are 8 GPU-s, and I want to use GPU-s 1,2,3,4,5, since GPU 0, 6, 7 are busy. So I set export CUDA_VISIBLE_DEVICES=1,2,3,4,5
I get the follow error message:
File "/home/CW01/uia64053/anaconda3/envs/sc_sfmlearner/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/CW01/uia64053/anaconda3/envs/sc_sfmlearner/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/raid/home/CW01/uia64053/external-algos/unsupervised-depth/SC-SfMLearner-Release/models/DispResNet.py", line 116, in forward
outputs = self.decoder(features)
File "/home/CW01/uia64053/anaconda3/envs/sc_sfmlearner/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/raid/home/CW01/uia64053/external-algos/unsupervised-depth/SC-SfMLearner-Release/models/DispResNet.py", line 91, in forward
x = self.convs("upconv", i, 0)
File "/home/CW01/uia64053/anaconda3/envs/sc_sfmlearner/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/raid/home/CW01/uia64053/external-algos/unsupervised-depth/SC-SfMLearner-Release/models/DispResNet.py", line 23, in forward
out = self.conv(x)
File "/home/CW01/uia64053/anaconda3/envs/sc_sfmlearner/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/raid/home/CW01/uia64053/external-algos/unsupervised-depth/SC-SfMLearner-Release/models/DispResNet.py", line 41, in forward
out = self.conv(out)
File "/home/CW01/uia64053/anaconda3/envs/sc_sfmlearner/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 345, in forward
.....
File "/home/CW01/uia64053/anaconda3/envs/sc_sfmlearner/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)
So it seems the data tensor and the model are not on the same GPU?
Thanks
The text was updated successfully, but these errors were encountered:
The solution in the pinned issue works, however there will be unbalanced load on the GPU-s. (Maybe the loss should be in the model for data parallelism)
Hi,
I'm trying to use multiple GPU-s to train with 160000 images.
There are 8 GPU-s, and I want to use GPU-s 1,2,3,4,5, since GPU 0, 6, 7 are busy. So I set export CUDA_VISIBLE_DEVICES=1,2,3,4,5
I get the follow error message:
File "/home/CW01/uia64053/anaconda3/envs/sc_sfmlearner/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/home/CW01/uia64053/anaconda3/envs/sc_sfmlearner/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/raid/home/CW01/uia64053/external-algos/unsupervised-depth/SC-SfMLearner-Release/models/DispResNet.py", line 116, in forward
outputs = self.decoder(features)
File "/home/CW01/uia64053/anaconda3/envs/sc_sfmlearner/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/raid/home/CW01/uia64053/external-algos/unsupervised-depth/SC-SfMLearner-Release/models/DispResNet.py", line 91, in forward
x = self.convs("upconv", i, 0)
File "/home/CW01/uia64053/anaconda3/envs/sc_sfmlearner/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/raid/home/CW01/uia64053/external-algos/unsupervised-depth/SC-SfMLearner-Release/models/DispResNet.py", line 23, in forward
out = self.conv(x)
File "/home/CW01/uia64053/anaconda3/envs/sc_sfmlearner/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/raid/home/CW01/uia64053/external-algos/unsupervised-depth/SC-SfMLearner-Release/models/DispResNet.py", line 41, in forward
out = self.conv(out)
File "/home/CW01/uia64053/anaconda3/envs/sc_sfmlearner/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 345, in forward
.....
File "/home/CW01/uia64053/anaconda3/envs/sc_sfmlearner/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)
So it seems the data tensor and the model are not on the same GPU?
Thanks
The text was updated successfully, but these errors were encountered: