You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running python3 PyTorch/cv/resnet50/train.py, its output is like:
(torchml) tim@tim-pc:~/DirectML$ python3 PyTorch/cv/resnet50/train.py
Dropped Escape call with ulEscapeCode : 0x03007703
Loading the training dataset from: /home/tim/DirectML/PyTorch/cv/data/cifar-10-python
Train data X [N, C, H, W]:
shape=torch.Size([32, 3, 224, 224]),
dtype=torch.float32
Train data Y:
shape=torch.Size([32]),
dtype=torch.int64
Loading the testing dataset from: /home/tim/DirectML/PyTorch/cv/data/cifar-10-python
Test data X [N, C, H, W]:
shape=torch.Size([32, 3, 224, 224]),
dtype=torch.float32
Test data Y:
shape=torch.Size([32]),
dtype=torch.int64
Finished moving resnet50 to device: privateuseone:0 in 2.6226043701171875e-06s.
Epoch 1
-------------------------------
D3D12: Removing Device.
Traceback (most recent call last):
File "/home/tim/DirectML/PyTorch/cv/resnet50/train.py", line 39, in <module>
main()
File "/home/tim/DirectML/PyTorch/cv/resnet50/train.py", line 34, in main
train(args.path, args.batch_size, args.epochs, args.learning_rate,
File "/home/tim/DirectML/PyTorch/cv/classification/train_classification.py", line 131, in main
train(training_dataloader,
File "/home/tim/DirectML/PyTorch/cv/classification/train_classification.py", line 84, in train
batch_loss = loss(model(X), y)
File "/home/tim/anaconda3/envs/torchml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/tim/anaconda3/envs/torchml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/tim/anaconda3/envs/torchml/lib/python3.10/site-packages/torchvision/models/resnet.py", line 285, in forward
return self._forward_impl(x)
File "/home/tim/anaconda3/envs/torchml/lib/python3.10/site-packages/torchvision/models/resnet.py", line 269, in _forward_impl
x = self.bn1(x)
File "/home/tim/anaconda3/envs/torchml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/tim/anaconda3/envs/torchml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/home/tim/anaconda3/envs/torchml/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py", line 175, in forward
return F.batch_norm(
File "/home/tim/anaconda3/envs/torchml/lib/python3.10/site-packages/torch/nn/functional.py", line 2482, in batch_norm
return torch.batch_norm(
RuntimeError: Unknown error -2005270521
After several tries, I found that the error will occur when running torch.batch_norm(), so I tried making a simple torch.Sequential() to run batch normalization, and it works as expected.
Also, my iGPU has a 8GB VRAM, and before the program crashes, only ~3.5GB is used, so there're still lots of free memory.
Issue description:
When running
python3 PyTorch/cv/resnet50/train.py
, its output is like:After several tries, I found that the error will occur when running
torch.batch_norm(),
so I tried making a simpletorch.Sequential()
to run batch normalization, and it works as expected.Also, my iGPU has a 8GB VRAM, and before the program crashes, only ~3.5GB is used, so there're still lots of free memory.
System details:
Python version: 3.10.0
WSL version: 2.3.26.0
WSL kernel version: 5.15.167.4-1
GPU: AMD Radeon 780M
DirectX version: 12.1
Pytorch version:
I also tried:
and the same error occurred.
The text was updated successfully, but these errors were encountered: