Skip to content

Commit

Permalink
Overiew of the optional performance features that are yet to be upstr…
Browse files Browse the repository at this point in the history
…eamed
  • Loading branch information
gshtras committed Mar 25, 2024
1 parent a7164ca commit ea6ed38
Showing 1 changed file with 14 additions and 0 deletions.
14 changes: 14 additions & 0 deletions ROCm_performance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Overview of the optional performance features uinque to https://github.com/ROCm/vllm
## Multi-GPU torchrun
On ROCm the default multi GPU executor is `torchrun` as opposed to `ray` on NVIDIA
This can be overriden by the `--worker-use-ray` flag to vllm or its benchmarks
To utilize torchran parallelism, the run command should be midified from
`python <command>`
to
`torchrun --standalone --nnodes=1 --nproc-per-node=<workd-size> <command>`
## Triton attention
The default attention function on ROCm is using triton attention kernel. To fallback to the https://github.com/ROCm/flash-attention implementation set up the following environment symbol:
`VLLM_USE_FLASH_ATTN_TRITON=False`
## Tunable ops
Pytorch tunable ops are supported.
Define the following environment symbol: `PYTORCH_TUNABLEOP_ENABLED=1` in order to enable both the runtime tuning and the subsequent use of tuned results. To only use the tuned results without tuning any newly encountered shapes, also define `PYTORCH_TUNABLEOP_TUNING=1`

0 comments on commit ea6ed38

Please sign in to comment.