Release GrCUDA 0.4.0 (multi-GPU support) - June 2022 · necst/grcuda

New features

Enabled support for multiple GPU in the asynchronous scheduler:
- Added the GrCUDADeviceManager component that encapsulates the status of the multi-GPU system. It tracks the currently active GPUs, the streams and the currently active computations associated with each GPU, and what data is up-to-date on each device.
- Added the GrCUDAStreamPolicy component that encapsulates new scheduling heuristics to select the best device for each new computation (CUDA streams are uniquely associated with a GPU), using information such as data locality and the current load of the device. We currently support 5 scheduling heuristics with increasing complexity:
  - ROUND_ROBIN: simply rotate the scheduling between GPUs. Used as initialization strategy of other policies;
  - STREAM_AWARE: assign the computation to the device with the fewest busy stream, i.e. select the device with fewer ongoing computations;
  - MIN_TRANSFER_SIZE: select the device that requires the least amount of bytes to be transferred, maximizing data locality;
  - MINMIN_TRANSFER_TIME: select the device for which the minimum total transfer time would be minimum;
  - MINMAX_TRANSFER_TIME select the device for which the maximum total transfer time would be minimum.
- Modified the GrCUDAStreamManager component to select the stream with heuristics provided by the policy manager.
- Extended the CUDARuntime component with APIs for selecting and managing multiple GPUs.
- Added the possibility to export the computation DAG obtained with a certain policy. If the ExportDAG startup option is enabled, before the context's cleanup, the graph will be exported in .dot format in the path specified by the user as option's argument.
- Added support for Graal 22.1 and CUDA 11.7.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GrCUDA 0.4.0 (multi-GPU support) - June 2022

New features