-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: kernels for random.vonmisses
; part 2
#681
base: master
Are you sure you want to change the base?
Conversation
@@ -1242,65 +1243,70 @@ void dpnp_rng_vonmises_large_kappa_c(void* result, const _DataType mu, const _Da | |||
|
|||
Uvec = reinterpret_cast<_DataType*>(dpnp_memory_alloc_c(size * sizeof(_DataType))); | |||
Vvec = reinterpret_cast<_DataType*>(dpnp_memory_alloc_c(size * sizeof(_DataType))); | |||
|
|||
for (size_t n = 0; n < size;) | |||
n = reinterpret_cast<size_t*>(dpnp_memory_alloc_c(sizeof(size_t))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is quite strange (Make scalar as a array with one element).
I think it should be a scalar, not an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scalar and this is the same. You just can not pass n to sycl region in other way.
Y = 0.0; | ||
else if (Y > 1.0) | ||
Y = 1.0; | ||
n[0] = n[0] + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a mistake. This is parallel environment (SYCL kernel). Writing inside the kernel into same memory cause https://en.wikipedia.org/wiki/Race_condition
V = Vvec[i]; | ||
sn2 = sn * sn; | ||
cn2 = cn * cn; | ||
auto paral_kernel_some = [&](cl::sycl::handler& cgh) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kernel inside the loop with bigger trip count. It would be more efficient to parallelize (make kernel) the algorithm by bigger value size
instead size-n
. So, it will require a loop inside the kernel.
It is questionable what will be more performant
- loop with a kernels queue (data dependent)
- kernel with a loop
It is hard to predict it with no perf measurements but I would vote that parallelization with bigger number of threads should be better.
@samir-nasibli Is this PR ready to review or still in development stage? |
I will update this PR or move some part of this changes to another PR with closing this. |
random.vonmisses
random.vonmisses
; part 2
*n = *n + 1; | ||
result1[*n] = cl::sycl::asin(cl::sycl::sqrt(Y)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like here we are getting race condition, that is why we are getting wrong results. To prevent it we should calculate n
(index of result) from i
.
*n = *n + 1; | ||
result1[*n] = cl::sycl::acos(W); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like here we are getting race condition, that is why we are getting wrong results. To prevent it we should calculate n (index of result) from i.
* Fix race condition in dpnp_rng_vonmises_small_kappa_c and dpnp_rng_vonmises_large_kappa_c * Rename arrays and change if condition from kernels in dpnp_rng_vonmises_large_kappa_c and dpnp_rng_vonmises_small_kappa_c * Add space * Fix indices in dpnp_rng_vonmises_small_kappa_c and dpnp_rng_vonmises_large_kappa_c
@LukichevaPolina |
The use of extra memory with the amount of data is not a good practice in optimization. We must avoid this cases. - ideas? |
using extra mem is brute force approach.
We need to investigate it.
|
Stale PR? |
Description
Enable computations on devices [CPU/GPU].
Tests
TODO
tests/test_random.py::TestDistributionsVonmises::test_seed
failed on both devices. Bug.