-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rodinia/gaussian sycl and ndpx implementation #307
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks in good shape!
dpbench/benchmarks/rodinia/gaussian/gaussian_sycl_native_ext/gaussian_sycl/_gaussian_kernel.hpp
Outdated
Show resolved
Hide resolved
dpbench/benchmarks/rodinia/gaussian/gaussian_sycl_native_ext/gaussian_sycl/_gaussian_kernel.hpp
Outdated
Show resolved
Hide resolved
dpbench/benchmarks/rodinia/gaussian/gaussian_sycl_native_ext/gaussian_sycl/_gaussian_sycl.cpp
Show resolved
Hide resolved
640d447
to
77fafac
Compare
These numbers look scary. 500X slowdown in numba_dpex over sycl. |
@roxx30198 please split commits so one commit is responsible for infrastructure changes and another one is for the gaussian implementation And let's set the input the way dpex implementation does not exceed 1s for S input. @adarshyoga does input sizes for M and L sounds reasonable for you? |
0074445
to
1820e49
Compare
We would need to experiment with this workload to arrive at an appropriate M and L. I think we can merge this PR with the current values of M and L. We can change them as we do deeper analysis into this workload and Rodinia, in general. |
4179cf5
to
5bec653
Compare
6855db3
to
263776f
Compare
0ba2a06
to
8da8d99
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
Added the sycl and numba-dpex implementation for gaussian elimination benchmark for rodinia.
Runtime for 100x100 matrix with fp64 on cpu:
sycl:
max execution times: 12.190916ms (12190916 ns)
min execution times: 6.433963ms (6433963 ns)
median execution times: 7.595746ms (7595746 ns)
repeats: 10
numba_dpex_k:
max execution times: 3.994273296s (3994273296 ns)
min execution times: 2.814416425s (2814416425 ns)
median execution times: 3.585554875s (3585554875 ns)
repeats: 10