Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SYCL: Alternative solution to avoid runtime error of launching kernel on xpu:1 when que… #769

Merged
merged 11 commits into from
Sep 29, 2024

Conversation

fengyuan14
Copy link
Contributor

@fengyuan14 fengyuan14 commented Aug 16, 2024

…rying SYCL kernel bundle ahead
The kernel won't be built for devices except for the first device. Launching kernel on devices except for the first device will raise runtime error. Here is an alternative as a temporary solution to provide an extra hint to SYCL runtime. intel/llvm#15127

@fengyuan14 fengyuan14 marked this pull request as ready for review August 16, 2024 06:09
@EikanWang
Copy link
Contributor

@fengyuan14 , by the way, please give some detailed information here.

Comment on lines +19 to +20
auto kbundle = ::sycl::get_kernel_bundle<::sycl::bundle_state::executable>(
ctx, {dev}, {kid});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fengyuan14 , could you help submit a GitHub issue to github.com/intel/llvm and then add some comments here to link the issue to be submitted?

Overall, LGTM.

@fengyuan14 fengyuan14 changed the title SYCL: Walk around runtime error of launching kernel on xpu:1 when que… SYCL: Alternative solution to avoid runtime error of launching kernel on xpu:1 when que… Aug 19, 2024
- name: Run XPU OP Examples
if: contains(inputs.ut, 'op_example') || github.event_name == 'schedule'
- name: Run XPU OP Regressions test
if: contains(inputs.ut, 'op_regression') || github.event_name == 'schedule'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are those test case come from? From previously example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Just renaming them.

.github/workflows/_linux_ut.yml Outdated Show resolved Hide resolved
.github/workflows/_linux_ut.yml Show resolved Hide resolved

class TestOperationOnDevice1(TestCase):
def test_sum_on_device1(self, dtype=torch.float):
if torch.xpu.device_count() >= 2:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For single card node, this test will be skipped by default, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Contributor

@chuanqi129 chuanqi129 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. CC: @RUIJIEZHONG66166 @mengfei25 to aware this UT structure change

@fengyuan14
Copy link
Contributor Author

@fengyuan14 , by the way, please give some detailed information here.

Done

@fengyuan14 fengyuan14 added this pull request to the merge queue Sep 29, 2024
Merged via the queue into main with commit 459f92c Sep 29, 2024
3 checks passed
@fengyuan14 fengyuan14 deleted the fy/multi-dev branch September 29, 2024 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants