Buffer Handling: Local Range

Opencl defines "number of workitems that share a local memory" as: "local work size" or "local range" or "local size" or "number of work-items in the work-group".

Cekirdekler API uses "local work size" for the grain-size of load balancing. When a device is faster than others, load balancer offloads some of the work from those to the fastest(and all other faster devices) device by a minimum step of "local work size". So one of the responsibilities of the developer is to take care of having enough workgroups to distribute to all devices. This grain size is same for all devices because it makes quicker calculation to reach a balance. If there are 1024 workitems in total and if local size is 256, then the API can't work with more than 4 GPUs. Decreasing local size to 32 makes sure it will handle any number of gpus on same motherboard.

     A.nextParam(B,C).compute(cruncher, 1, "vectorAdd", 1000, 100);

here 100 is the local size and is used to distribute 10(1000/100) workgroups to all devices found in cruncher.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Buffer Handling: Local Range

Clone this wiki locally