Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Nonzero: Worse host overhead compared with IPEX #969

Open
fengyuan14 opened this issue Oct 16, 2024 · 1 comment
Open

Performance: Nonzero: Worse host overhead compared with IPEX #969

fengyuan14 opened this issue Oct 16, 2024 · 1 comment
Assignees
Milestone

Comments

@fengyuan14
Copy link
Contributor

🐛 Describe the bug

                              Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg      Self XPU    Self XPU %     XPU total  XPU time avg    # of Calls		 
non override         aten::nonzero         5.50%      63.456ms        54.60%     630.302ms     489.365us       5.160ms         4.35%      34.468ms      26.761us          1288  
override             aten::nonzero         5.40%      58.551ms        52.64%     570.870ms     443.222us       6.688ms         5.55%      34.737ms      26.970us          1288

Versions

Latest torch-xpu-ops vs IPEX 2.3 implementation.

@majing921201
Copy link
Contributor

The low performance is caused by SYCL API, which we used to query kernel specific max work group size. We file issue to compiler to track this issue. intel/llvm#15824

@riverliuintel riverliuintel modified the milestones: PT2.6, PT2.7 Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants