PI_ERROR_INVALID_QUEUE after copying device 0 tensor to device 1 #745

daisyden · 2024-08-12T02:44:04Z

🐛 Describe the bug

import torch
a = torch.empty(3, device=torch.device('xpu:0'))
a.fill_(1.1)
b = a.to(device='xpu:1')
a.device
b.device
print(b.cpu())
**print(b)**

Report:

tensor([1.1000, 1.1000, 1.1000])
Traceback (most recent call last):
  File "/home/gta/daisyden/pytorch4/test/aa.py", line 8, in <module>
    print(b)
  File "/home/gta/miniforge3/envs/daisy_pytorch4/lib/python3.10/site-packages/torch/_tensor.py", line 464, in __repr__
    return torch._tensor_str._str(self, tensor_contents=tensor_contents)
  File "/home/gta/miniforge3/envs/daisy_pytorch4/lib/python3.10/site-packages/torch/_tensor_str.py", line 714, in _str
    return _str_intern(self, tensor_contents=tensor_contents)
  File "/home/gta/miniforge3/envs/daisy_pytorch4/lib/python3.10/site-packages/torch/_tensor_str.py", line 631, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "/home/gta/miniforge3/envs/daisy_pytorch4/lib/python3.10/site-packages/torch/_tensor_str.py", line 363, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/home/gta/miniforge3/envs/daisy_pytorch4/lib/python3.10/site-packages/torch/_tensor_str.py", line 152, in __init__
    nonzero_finite_vals = torch.masked_select(
RuntimeError: Native API failed. Native API returns: -36 (PI_ERROR_INVALID_QUEUE) -36 (PI_ERROR_INVALID_QUEUE)

Versions

latest version

The text was updated successfully, but these errors were encountered:

fengyuan14 · 2024-08-12T07:29:08Z

SYCL runtime issue.

As latest SYCL spec, we are recommended to use info::kernel_device_specific::work_group_size instead of info::device::max_work_group_size. But there is a new issue found. Cannot launch kernel successfully on PVC Tile 1 after querying info::kernel_device_specific::work_group_size. Got runtime error.

daisyden · 2024-08-13T03:22:48Z

duplicated with #339

fengyuan14 · 2024-08-16T05:47:48Z

The issue is common for all platform where there are devices more than one. The most important and most common case for us is client case, a client platform/desktop has an iGPU and an dGPU.

fengyuan14 · 2024-08-19T06:54:11Z

intel/llvm#15127

ddkalamk · 2024-09-13T06:10:33Z

@fengyuan14 can we please apply the workaround available to fix this problem?

i.e. change
https://github.com/intel/torch-xpu-ops/blob/main/src/comm/DeviceProperties.h#L19C3-L20C79
auto kbundle = ::sycl::get_kernel_bundle<::sycl::bundle_state::executable>(ctx, {kid});

to

auto kbundle = ::sycl::get_kernel_bundle<::sycl::bundle_state::executable>(ctx, {dev}, {kid});

ddkalamk · 2024-09-13T06:23:08Z

@daisyden @fengyuan14
Test results after applying fix:

(pt_src) [ddkalamk@pcl-pvc01 pytorch]$ cat test2.py
import torch
print("PyTorch version: ", torch.__version__)
a = torch.empty(3, device=torch.device('xpu:0'))
a.fill_(1.1)
b = a.to(device='xpu:1')
a.device
b.device
print(b.cpu())
print(b)

(pt_src) [ddkalamk@pcl-pvc01 pytorch]$ python -u test2.py
PyTorch version:  2.5.0a0+git8693322
tensor([1.1000, 1.1000, 1.1000])
tensor([1.1000, 1.1000, 1.1000], device='xpu:1')

fengyuan14 · 2024-09-13T06:24:57Z

Hi, @ddkalamk.
We have got a PR for it on main branch. Recently, we are busy on PT2.5 release. Will land the PR ASAP.
#769

ddkalamk · 2024-09-13T06:29:13Z

Sounds good, thanks.

fengyuan14 · 2024-10-15T05:09:27Z

The WA has been merged in main branch. #769

daisyden assigned fengyuan14 Aug 12, 2024

daisyden added the ut_triaged label Aug 12, 2024

daisyden changed the title ~~PI_ERROR_INVALID_QUEUE after copy device 0 tensor to device 1~~ PI_ERROR_INVALID_QUEUE after copying device 0 tensor to device 1 Aug 12, 2024

chuanqi129 modified the milestones: PT2.6, PT2.5 Aug 13, 2024

chuanqi129 added the client label Aug 13, 2024

daisyden mentioned this issue Aug 13, 2024

Polygamma: UT failure #622

Closed

4 tasks

daisyden mentioned this issue Aug 13, 2024

[To Evaluate] Issues in test_packed_sequence #339

Closed

PenghuiCheng mentioned this issue Aug 15, 2024

[To evaluate]"to" xpu device error in nn/test_packed_sequence_xpu.py #767

Closed

fengyuan14 mentioned this issue Aug 19, 2024

[SYCL] Build kernel only for device 0 when using sycl::get_kernel_bundle on the platform with multiple devices intel/llvm#15127

Closed

chuanqi129 modified the milestones: PT2.5, PT2.6 Oct 14, 2024

fengyuan14 closed this as completed Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PI_ERROR_INVALID_QUEUE after copying device 0 tensor to device 1 #745

PI_ERROR_INVALID_QUEUE after copying device 0 tensor to device 1 #745

daisyden commented Aug 12, 2024

fengyuan14 commented Aug 12, 2024 •

edited

Loading

daisyden commented Aug 13, 2024 •

edited

Loading

fengyuan14 commented Aug 16, 2024

fengyuan14 commented Aug 19, 2024

ddkalamk commented Sep 13, 2024

ddkalamk commented Sep 13, 2024

fengyuan14 commented Sep 13, 2024 •

edited

Loading

ddkalamk commented Sep 13, 2024

fengyuan14 commented Oct 15, 2024

PI_ERROR_INVALID_QUEUE after copying device 0 tensor to device 1 #745

PI_ERROR_INVALID_QUEUE after copying device 0 tensor to device 1 #745

Comments

daisyden commented Aug 12, 2024

🐛 Describe the bug

Versions

fengyuan14 commented Aug 12, 2024 • edited Loading

daisyden commented Aug 13, 2024 • edited Loading

fengyuan14 commented Aug 16, 2024

fengyuan14 commented Aug 19, 2024

ddkalamk commented Sep 13, 2024

ddkalamk commented Sep 13, 2024

fengyuan14 commented Sep 13, 2024 • edited Loading

ddkalamk commented Sep 13, 2024

fengyuan14 commented Oct 15, 2024

fengyuan14 commented Aug 12, 2024 •

edited

Loading

daisyden commented Aug 13, 2024 •

edited

Loading

fengyuan14 commented Sep 13, 2024 •

edited

Loading