Dictionary

Jump to bottom

Hüseyin Tuğrul BÜYÜKIŞIK edited this page May 30, 2017 · 7 revisions

Streaming data: for this project, this means zero-copy access between device and host. Used in both multi-gpu and single-gpu.

Event driven pipeline:

overlaps data-read(of all input arrays) with first kernel execution in the list of kernel names.
computes all intermediate kernels.
overlaps data-write(of all output arrays) with last kernel execution in the list of kernel names.

works with multi-single gpu

Driver controlled pipeline:

divides all work into smaller read+compute+write operations
sends all concurrently to gpu which gets driver controlled overlapping behavior

works with multi-single gpu

Device to device pipeline:

only single gpu per pipeline stage is assumed to be used
data flows through pipeline only 1 stage at a time
data exits the pipeline after N times
gpu-compute and gpu-gpu data transitions are overlapped. host-gpu and gpu-host transitions are not overlapped and serialized with gpu compute.

Enqueue Mode:

Meant to optimize single GPU scenarios. Uses single command queue for all work.
Musch less accumulation of API-overhead over thousands of compute()
Async mode enables multiple command queues for different compute() groups in a single enqueue mode batch.