feat: use async vllm engine (only used in unit tests) #418

parthchadha · 2025-05-20T00:09:47Z

What does this PR do ?

This PR adds the capability to use async vllm engine and verify its correctness in unit tests. This is first of N PR's that will enable async processing of rollout requests.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Signed-off-by: Parth Chadha <[email protected]>

terrykong · 2025-05-20T19:17:22Z

examples/configs/grpo_math_1B.yaml

do you mind adding this key to all the grpo recipes under examples/config/recipes/grpo*.yaml?

terrykong · 2025-05-20T19:37:45Z

nemo_rl/models/generation/vllm.py

-        # Reset the prefix cache to ensure that prefix cache is not reused after weights are updated
-        self.llm.llm_engine.reset_prefix_cache()
-        self.llm.sleep(level=1)
+    async def sleep(self):


i don't see a synchronous analog to some of these async functions. Is the expectation for users using LLMEngine to still await or should we provide a synchronous variant?

parthchadha added 2 commits May 19, 2025 17:03

feat: add async vllm engine for verifying its correctness in unit tests

3b6e457

Signed-off-by: Parth Chadha <[email protected]>

Merge remote-tracking branch 'origin/main' into pchadha/async-vllm

c1d61a3

parthchadha added the CI:L0 Run doctests and unit tests label May 20, 2025

parthchadha temporarily deployed to nemo-ci May 20, 2025 00:10 — with GitHub Actions Inactive

parthchadha requested review from terrykong and SahilJain314 May 20, 2025 14:31

parthchadha added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels May 20, 2025

parthchadha temporarily deployed to nemo-ci May 20, 2025 14:32 — with GitHub Actions Inactive

fix: Fix failing unit test

fb47ba9

Signed-off-by: Parth Chadha <[email protected]>

parthchadha force-pushed the pchadha/async-vllm branch from 09c45a8 to fb47ba9 Compare May 20, 2025 15:13

parthchadha added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels May 20, 2025

parthchadha temporarily deployed to nemo-ci May 20, 2025 15:15 — with GitHub Actions Inactive

terrykong reviewed May 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: use async vllm engine (only used in unit tests) #418

feat: use async vllm engine (only used in unit tests) #418

Uh oh!

parthchadha commented May 20, 2025

Uh oh!

terrykong May 20, 2025

Uh oh!

terrykong May 20, 2025

Uh oh!

Uh oh!

feat: use async vllm engine (only used in unit tests) #418

Are you sure you want to change the base?

feat: use async vllm engine (only used in unit tests) #418

Uh oh!

Conversation

parthchadha commented May 20, 2025

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

terrykong May 20, 2025

Choose a reason for hiding this comment

Uh oh!

terrykong May 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!