-
Notifications
You must be signed in to change notification settings - Fork 10
Home
This GPGPU library makes handling buffer copies, computing on multiple devices and creating more complex opencl programs easier for people. One of the main features inside is (iterative)load-balancer that distributes a workload to multiple devices accordingly with their performances, dynamically on iterations but statically on kernel level. (dynamical global range specifier, not a divide-and-conquer since it is very slow because of opencl api overhead in windows) Both performance advantage and memory consistency of data distribution in this project comes from replication of host-side buffers in devices. This is somewhat similar to cluster computing but within a single PC's OpenCL capable devices. Real cluster computing part will be added in future.
(below picture depicts a very simplified "shared-distributed memory model" that maintains kernel simplicity)
This is a two-project library.
-
First project is to have a C++ KutuphaneCL.dll file that is a 64-bit build for Windows. Here: https://github.com/tugrul512bit/CekirdeklerCPP compile this.
-
Second project uses KutuphaneCL.dll (and Kernel32.dll of Windows for some array copy) and produces a Cekirdekler.dll which is 64-bit build too.
then you can use Cekirdekler.dll in your projects after adding it as a reference, copying it near the executable of your project(along with Kutuphane.dll).
For now, it needs .Net 3.5 but can easily be changed to 2.0 compatible using 2 more dll files (System.Threading.dll and System.Threading.xml) from .Net framework(you can download it by searching for "Parallel.For" for .Net 2.0 dll).
Lazy developers can get compiled dll files in main folder as rar file. They are built on a Celeron N3060 so don't expect miracles. Both dlls are built as 64-bit.
This is how a hello-world looks like in C#:
ClNumberCruncher cr = new ClNumberCruncher(
AcceleratorType.GPU, @"
__kernel void hello(__global char * arr)
{
printf(""hello world"");
}
");
ClArray<byte> array = new ClArray<byte>(1000);
// gpu-1 could be computing 99th workitem and gpu-2 could be compting 100th workitem
// with groups of 100s of workitems distributed to multiple gpus.
array.compute(cr, 1, "hello", 1000, 100);
// no need to dispose anything. they do it themselves when out of scope or gc.
or with explicit device selection
ClNumberCruncher cr = new ClNumberCruncher(
Hardware.ClPlatforms.all().gpus()[1] /* second gpu from listed gpu devices */, @"
__kernel void hello(__global char * arr)
{
printf(""hello world"");
}
");
ClArray<byte> array = new ClArray<byte>(1000);
// you can use your own arrays too
// byte [] myCSharpArray = new byte[1000];
// ClArray<byte> array = myCSharpArray ;
array.compute(cr, 1, "hello", 1000, 100);
// no need to dispose anything. they do it themselves when out of scope or gc.
With kutuphanecl.dll v1.3.8- (from CekirdeklerCPP project), only OpenCL 1.2 versions of platforms and their devices are chosen by this API. In future, will get update for next versions(and probably merge with Vulkan).
With kutuphanecl.dll v1.4.1+ (from CekirdeklerCPP2 project), OpenCL 2.0 dynamic parallelism and kernel-only features(such as work group reductions) are supported.