-
Notifications
You must be signed in to change notification settings - Fork 10
Dictionary
Hüseyin Tuğrul BÜYÜKIŞIK edited this page May 30, 2017
·
7 revisions
Streaming data: for this project, this means zero-copy access between device and host. Used in both multi-gpu and single-gpu.
Event driven pipeline:
- overlaps data-read(of all input arrays) with first kernel execution in the list of kernel names.
- computes all intermediate kernels.
- overlaps data-write(of all output arrays) with last kernel execution in the list of kernel names.
works with multi-single gpu
Driver controlled pipeline:
- divides all work into smaller read+compute+write operations
- sends all concurrently to gpu which gets driver controlled overlapping behavior
works with multi-single gpu
Device to device pipeline:
- only single gpu per pipeline stage is assumed to be used
- data flows through pipeline only 1 stage at a time
- data exits the pipeline after N times
- gpu-compute and gpu-gpu data transitions are overlapped. host-gpu and gpu-host transitions are not overlapped and serialized with gpu compute.
Enqueue Mode:
- Meant to optimize single GPU scenarios. Uses single command queue for all work.
- Musch less accumulation of API-overhead over thousands of compute()
- Async mode enables multiple command queues for different compute() groups in a single enqueue mode batch.