Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown error -2005270521 was caught when testing torch-directml resnet50 demo #672

Open
Basicname opened this issue Nov 30, 2024 · 0 comments

Comments

@Basicname
Copy link

Issue description:

When running python3 PyTorch/cv/resnet50/train.py, its output is like:

(torchml) tim@tim-pc:~/DirectML$ python3 PyTorch/cv/resnet50/train.py
Dropped Escape call with ulEscapeCode : 0x03007703
Loading the training dataset from: /home/tim/DirectML/PyTorch/cv/data/cifar-10-python
        Train data X [N, C, H, W]:
                shape=torch.Size([32, 3, 224, 224]),
                dtype=torch.float32
        Train data Y:
                shape=torch.Size([32]),
                dtype=torch.int64
Loading the testing dataset from: /home/tim/DirectML/PyTorch/cv/data/cifar-10-python
        Test data X [N, C, H, W]:
                shape=torch.Size([32, 3, 224, 224]),
                dtype=torch.float32
        Test data Y:
                shape=torch.Size([32]),
                dtype=torch.int64
Finished moving resnet50 to device: privateuseone:0 in 2.6226043701171875e-06s.
Epoch 1
-------------------------------
D3D12: Removing Device.
Traceback (most recent call last):
  File "/home/tim/DirectML/PyTorch/cv/resnet50/train.py", line 39, in <module>
    main()
  File "/home/tim/DirectML/PyTorch/cv/resnet50/train.py", line 34, in main
    train(args.path, args.batch_size, args.epochs, args.learning_rate,
  File "/home/tim/DirectML/PyTorch/cv/classification/train_classification.py", line 131, in main
    train(training_dataloader,
  File "/home/tim/DirectML/PyTorch/cv/classification/train_classification.py", line 84, in train
    batch_loss = loss(model(X), y)
  File "/home/tim/anaconda3/envs/torchml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/tim/anaconda3/envs/torchml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/tim/anaconda3/envs/torchml/lib/python3.10/site-packages/torchvision/models/resnet.py", line 285, in forward
    return self._forward_impl(x)
  File "/home/tim/anaconda3/envs/torchml/lib/python3.10/site-packages/torchvision/models/resnet.py", line 269, in _forward_impl
    x = self.bn1(x)
  File "/home/tim/anaconda3/envs/torchml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/tim/anaconda3/envs/torchml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/tim/anaconda3/envs/torchml/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py", line 175, in forward
    return F.batch_norm(
  File "/home/tim/anaconda3/envs/torchml/lib/python3.10/site-packages/torch/nn/functional.py", line 2482, in batch_norm
    return torch.batch_norm(
RuntimeError: Unknown error -2005270521

After several tries, I found that the error will occur when running torch.batch_norm(), so I tried making a simple torch.Sequential() to run batch normalization, and it works as expected.
Also, my iGPU has a 8GB VRAM, and before the program crashes, only ~3.5GB is used, so there're still lots of free memory.

System details:

Python version: 3.10.0
WSL version: 2.3.26.0
WSL kernel version: 5.15.167.4-1
GPU: AMD Radeon 780M
DirectX version: 12.1
Pytorch version:

torch                    2.2.1
torch-directml           0.2.1.dev240521
torchvision              0.17.1

I also tried:

torch                    2.4.1
torch-directml           0.2.5.dev240914
torchvision              0.19.1

and the same error occurred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant