Skip to content

Dictionary

Hüseyin Tuğrul BÜYÜKIŞIK edited this page May 30, 2017 · 7 revisions

Streaming data: for this project, this means zero-copy access between device and host. Used in both multi-gpu and single-gpu.

Event driven pipeline:

  • overlaps data-read(of all input arrays) with first kernel execution in the list of kernel names.
  • computes all intermediate kernels.
  • overlaps data-write(of all output arrays) with last kernel execution in the list of kernel names.

works with multi-single gpu

Driver controlled pipeline:

  • divides all work into smaller read+compute+write operations
  • sends all concurrently to gpu which gets driver controlled overlapping behavior

works with multi-single gpu

Device to device pipeline:

  • only single gpu per pipeline stage is assumed to be used
  • data flows through pipeline only 1 stage at a time
  • data exits the pipeline after N times
  • gpu-compute and gpu-gpu data transitions are overlapped. host-gpu and gpu-host transitions are not overlapped and serialized with gpu compute.

Enqueue Mode:

  • Meant to optimize single GPU scenarios. Uses single command queue for all work.
  • Musch less accumulation of API-overhead over thousands of compute()
  • Async mode enables multiple command queues for different compute() groups in a single enqueue mode batch.