Skip to content

Commit

Permalink
Update ROCm_performance.md
Browse files Browse the repository at this point in the history
  • Loading branch information
gshtras authored Mar 25, 2024
1 parent ea6ed38 commit 6c34e32
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion ROCm_performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ This can be overriden by the `--worker-use-ray` flag to vllm or its benchmarks
To utilize torchran parallelism, the run command should be midified from
`python <command>`
to
`torchrun --standalone --nnodes=1 --nproc-per-node=<workd-size> <command>`
`torchrun --standalone --nnodes=1 --nproc-per-node=<world-size> <command>`
## Triton attention
The default attention function on ROCm is using triton attention kernel. To fallback to the https://github.com/ROCm/flash-attention implementation set up the following environment symbol:
`VLLM_USE_FLASH_ATTN_TRITON=False`
Expand Down

0 comments on commit 6c34e32

Please sign in to comment.