-
Notifications
You must be signed in to change notification settings - Fork 10
Bugs
🍏Kernel Level Pipelining
According to this discussion in Khronos forum: https://forums.khronos.org/showthread.php/13494-Is-this-undefined-behavior
there is no undefined behavior for using this feature, so rest of this part is invalid now:
(deprecated)
Using "event driven" and "driver controlled" (in-device many duplicated but partial kernels derived from same kernel)
pipelining to divide a single kernel, is out of spec of OpenCL but seems to be working for Amd and Intel GPUs for now(2017). When they are enabled(by adding a "true" boolean value as parameter to compute()), the API slices a region into N parts and reads/writes only that parts with clEnqueueRead/WriteBuffer commands per (sliced and offsetted)kernel. **Multiple kernels writing(reading is ok) to different regions of a buffer at the same time, is an undefined behavior.**
Tests that seemed to be working, have shown no error in data nor in OpenCL error code, yet. Anyway, use it at your own risk.
Driver - event pipelining page:
https://github.com/tugrul512bit/Cekirdekler/wiki/Pipelining
Since this is just a boolean switch, you can turn it off easily and trade some performance for stability.
How else can I hide a single kernel's and a single buffer's latencies? Just tried some stairway(event) overlapping and it worked righaway for (Amd)HD7870, R7-240, RX-550, FX8150 and (Intel)HD400, N3060 then tried with "free" queues which made it even faster.
I noticed this "bug" after reading this page [this stackoverflow question](https://stackoverflow.com/questions/28505604/is-it-defined-to-write-to-the-same-buffer-from-different-kernels) because no such explanation was present [in this page](http://developer.amd.com/wordpress/media/2013/07/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide-rev-2.7.pdf) and I joined Khronos forums very lately.
(deprecated)
🍏Device Level Pipelining
This does not overlap any two kernels with same buffer(double buffering). So no bugs for single-device pipelining feature.
🍏System Level Pipelining
Since each device works in a different context(explicitly controlled), they don't have out-of-spec issues. Safe to use load balancing and device-to-device(again, uses double buffering.) pipelining features.
🍏Load Balancing
This uses multiple contexts too. No bugs.
As of v1.4.1_update4, -cl-std=CL1.2 option is added to OpenCL kernel compiling options so some failing devices may start working with this update.(if you are using update 3 already, just exchange the KutuphaneCL.dll file)
Please, if you ever see an error in logic or an undefined behavior, send me a mail:
huseyin (a dot here) tugrul (another dot char here) buyukisik (@ char as usual) gmail(dot)com