Skip to content

Commit

Permalink
Add warning on CUDA graph memory usage (vllm-project#2182)
Browse files Browse the repository at this point in the history
  • Loading branch information
WoosukKwon authored Dec 19, 2023
1 parent 290e015 commit 21d5daa
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions vllm/worker/model_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -395,6 +395,9 @@ def capture_model(self, kv_caches: List[KVCache]) -> None:
"unexpected consequences if the model is not static. To "
"run the model in eager mode, set 'enforce_eager=True' or "
"use '--enforce-eager' in the CLI.")
logger.info("CUDA graphs can take additional 1~3 GiB memory per GPU. "
"If you are running out of memory, consider decreasing "
"`gpu_memory_utilization` or enforcing eager mode.")
start_time = time.perf_counter()

# Prepare dummy inputs. These will be reused for all batch sizes.
Expand Down

0 comments on commit 21d5daa

Please sign in to comment.