-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SYCL: using SYCL group algorithm API instead of old style for sub group shift utilities #635
Conversation
f80a4c8
to
c94931a
Compare
THe reason is that |
Verify on c++17, std::pair is not a C++ trivially copyable structure.
|
Verified,
|
Hi - I think several cases where you're migrating to shifts or permutes could also be greatly simplified by using SYCL 2020 reduce_over_group at either sub-group or work-group scope. |
Additionally, any calls to nd_item::barrier should move to sycl::group_barrier. |
@jbrodman Thanks for the remainder. Yes. It's on our plan. We reviewed all APIs which should move to group algorithm API in IPEX two years ago. But at that time, these APIs, like group_barrier or reduce_over_group, got some performance issues. So we didn't adopt them. For example, memory fence implicated in Exactly we should follow SYCL spec to move to new APIs and cowork with implementation team to optimize performance. |
@xytintel Ready to preci? |
I think yes |
src/comm/XPUPair.h
Outdated
|
||
// specializations for tuple_size | ||
template <> | ||
struct tuple_size<tuple<>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if tuple
and tuple_size
are valid structs as they are defined as forward declarations.
We just pushed a fix into IGC that should solve the barrier performance issue. It would be a huge benefit for the SYCL compiler and runtime to dogfood things like the group algorithms so we can make sure they're performing properly. |
src/comm/XPUPair.h
Outdated
template <typename T1, typename T2> | ||
inline void swap(T1& a, T2& b) { | ||
T1 temp = a; | ||
a = b; | ||
b = temp; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this reimplementing swap and not using std::swap
? This doesn't work correctly if T1
or T2
have their custom swap specialization.
src/comm/XPUPair.h
Outdated
inline void swap(pair& p) { | ||
swap(first, p.first); | ||
swap(second, p, second); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use std::swap
and typo "p, second".
inline void swap(pair& p) { | |
swap(first, p.first); | |
swap(second, p, second); | |
} | |
inline void swap(pair& p) { | |
using std::swap; | |
swap(first, p.first); | |
swap(second, p.second); | |
} |
src/comm/XPUPair.h
Outdated
|
||
template <unsigned int N, typename T1, typename T2> | ||
inline typename tuple_element<N, pair<T1, T2>>::type& get(pair<T1, T2>& p) { | ||
return detail::pair_get<N, pair<T1, T2>>()(p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary complicated.
return detail::pair_get<N, pair<T1, T2>>()(p); | |
if constexpr(N==0) return p.first; else return p.second.; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with your proposal. I will remove the unnecessary code.
We will change the APIs gradually. Thanks. |
@rolandschulz Any more comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no other comments
src/comm/XPUPair.h
Outdated
pair(const std::pair<U1, U2>& p) : first(p.first), second(p.second) {} | ||
|
||
inline void swap(pair& p) { | ||
std::swap(first, p.first); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still doesn't work with custom swap (e.g. if you have pair<pair<...>...>
). Correct use needs to enable ADL. For details see e.g. https://stackoverflow.com/questions/28130671/how-does-using-stdswap-enable-argument-dependent-lookup-adl.
Nit: This member function seems unnecessary. The implementation could go directly into the free-function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xytintel Please address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
src/comm/XPUPair.h
Outdated
|
||
template <typename T1, typename T2> | ||
inline void swap(pair<T1, T2>& x, pair<T1, T2>& y) { | ||
return x.swap(y); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you removed it but didn't move the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Changes: * Added a hack to fix 2035 in oneDNN * Commented out oneapi specific location for libOpenCL.so * Added hacks in torch-xpu-ops to WA dpc++ and intel/llvm behavior differences Above are hacks which need proper resolutions. See: oneapi-src/oneDNN#2035 Requires: intel/torch-xpu-ops#635 Signed-off-by: Dmitry Rogozhkin <[email protected]>
Change 1:
SYCL is shifting to SYCL group algorithm API to unify sub-group, work-group APIs, e.g. reduce_over_group, group_barrier, and shift_group here. And old style (use separate member function of each structure) is being deprecated.
FYI: Please don't use sg.shuffle members anymore since they are deprecated.
Change 2:
sycl::shfit_group_xxx
is more restricted thansg.shuffle
, requiring the object shifted is C++ trivially copyable object. We implemented privatepair
instead ofstd::pair
in the commit.FYI: CUDA is using
thrust::pair
in kernel.