Bugs

🍏Kernel Level Pipelining

According to this discussion in Khronos forum: https://forums.khronos.org/showthread.php/13494-Is-this-undefined-behavior

there is no undefined behavior for using this feature, so rest of this part is invalid now:

(deprecated)

Using "event driven" and "driver controlled" (in-device many duplicated but partial kernels derived from same kernel) 
 pipelining  to divide a single kernel, is out of spec of OpenCL but seems to be working for Amd and Intel GPUs for now(2017). When they are enabled(by adding a "true" boolean value as parameter to compute()), the API slices a region into N parts and reads/writes only that parts with clEnqueueRead/WriteBuffer commands per (sliced and offsetted)kernel. **Multiple kernels writing(reading is ok) to different regions of a buffer at the same time,  is an undefined behavior.** 

Tests that seemed to be working, have shown no error in data nor in OpenCL error code, yet. Anyway, use it at your own risk.

Driver - event pipelining page:

https://github.com/tugrul512bit/Cekirdekler/wiki/Pipelining

Since this is just a boolean switch, you can turn it off easily and trade some performance for stability.

How else can I hide a single kernel's and a single buffer's latencies? Just tried some stairway(event) overlapping and it worked righaway for (Amd)HD7870, R7-240, RX-550, FX8150 and (Intel)HD400, N3060 then tried with "free" queues which made it even faster.

I noticed this "bug" after reading this page [this stackoverflow question](https://stackoverflow.com/questions/28505604/is-it-defined-to-write-to-the-same-buffer-from-different-kernels) because no such explanation was present [in this page](http://developer.amd.com/wordpress/media/2013/07/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide-rev-2.7.pdf) and I joined Khronos forums very lately.

(deprecated)

🍏Device Level Pipelining

This does not overlap any two kernels with same buffer(double buffering). So no bugs for single-device pipelining feature.

🍏System Level Pipelining

Since each device works in a different context(explicitly controlled), they don't have out-of-spec issues. Safe to use load balancing and device-to-device(again, uses double buffering.) pipelining features.

🍏Load Balancing

This uses multiple contexts too. No bugs.

As of v1.4.1_update4, -cl-std=CL1.2 option is added to OpenCL kernel compiling options so some failing devices may start working with this update.(if you are using update 3 already, just exchange the KutuphaneCL.dll file)

Please, if you ever see an error in logic or an undefined behavior, send me a mail:

huseyin (a dot here) tugrul (another dot char here) buyukisik (@ char as usual) gmail(dot)com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugs

Clone this wiki locally