You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have encountered the CUDA Error and set CUDA_LAUNCH_BLOCKING=1 for better debugging. Below is the specific error message I received:
File "train.py", line 221, in<module>
train(net, loader_train, loader_test, optimizer, criterion)
File "train.py", line 66, in train
out = net(x)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], **kwargs[0])
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/root/DEA-RWKV/code/model/backbone_train.py", line 113, in forward
x8 = self.level3_VRWKV8(x8, patch_resolution)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/root/DEA-RWKV/code/model/vrwkv6.py", line 356, in forward
x = _inner_forward(x)
File "/root/DEA-RWKV/code/model/vrwkv6.py", line 349, in _inner_forward
x = x + self.drop_path(self.att(self.ln1(x), patch_resolution))
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/root/DEA-RWKV/code/model/vrwkv6.py", line 239, in forward
x = _inner_forward(x)
File "/root/DEA-RWKV/code/model/vrwkv6.py", line 232, in _inner_forward
x = RUN_CUDA_RWKV6(B, T, C, self.n_head, r, k, v, w, u=self.time_faaaa)
File "/root/DEA-RWKV/code/model/vrwkv6.py", line 66, in RUN_CUDA_RWKV6
return WKV_6.apply(B, T, C, H, r, k, v, w, u)
(Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:104.)
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
File "train.py", line 221, in<module>
train(net, loader_train, loader_test, optimizer, criterion)
File "train.py", line 72, in train
loss.backward()
File "/root/miniconda3/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/__init__.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/function.py", line 253, in apply
return user_fn(self, *args)
File "/root/DEA-RWKV/code/model/vrwkv6.py", line 62, in backward
gu = torch.sum(gu, 0).view(H, C//H)
RuntimeError: CUDA error: an illegal memory access was encountered
Hope to get your reply!
Thank you.
The text was updated successfully, but these errors were encountered:
Hello,
I have encountered the CUDA Error and set CUDA_LAUNCH_BLOCKING=1 for better debugging. Below is the specific error message I received:
Hope to get your reply!
Thank you.
The text was updated successfully, but these errors were encountered: