Skip to content

feat: Add IFEval Environment #412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: geshen/dset
Choose a base branch
from
Open

Conversation

abukharin-nv
Copy link
Collaborator

What does this PR do ?

Adds IFEval environment and script.

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
uv pip install absl-py langdetect nltk==3.9.1 immutabledict \
&& uv run examples/run_grpo_ifeval.py \

Before your PR is "Ready for review"

Pre checks:

  • [Y] Make sure you read and followed Contributor guidelines
  • [N] Did you write any new necessary tests?
  • [N] Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • [N] Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Signed-off-by: abukharin-nv <[email protected]>
@abukharin-nv abukharin-nv changed the title Add IFEval Environment feat: Add IFEval Environment May 19, 2025
@@ -58,8 +58,8 @@ policy:
# training and logprob stages respectively.
dynamic_batching:
enabled: True
train_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.train_micro_batch_size}}
logprob_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.logprob_batch_size}}
train_mb_tokens: 2048
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason for this change?

@@ -126,6 +126,8 @@ data:
env:
math:
num_workers: 8
ifeval:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be moved to a derivative config?



@ray.remote
class HFVerifyWorker:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class HFVerifyWorker:
class IFVerifyWorker:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit)


download_punkt_tab()

WORD_LIST = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@terrykong do you have an opinion on making this sort of 'data' list be in a file that's not .py? It being in this file might make searching harder (i frequently do search with *.py filter).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm indifferent. Okay as data or in python

@SahilJain314
Copy link
Collaborator

Can we add the requirements for ifeval

uv pip install absl-py langdetect nltk==3.9.1 immutabledict \
&& uv run examples/run_grpo_ifeval.py \

to the pyproject in a separate group? Then use the uv environment method to run the ifeval workers with the correct deps? It's for this purpose.

pyproject.toml Outdated
@@ -67,6 +67,12 @@ test = [
"pytest-timeout",
"pytest-cov",
]
ifeval = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved to the [project.optional-dependencies] (like vllm above)?

Then, when you want to run an ifeval environment/need it's deps, you should run that ray remote with a PY_EXECUTABLE that points to that optional dep. Like the following for VLLM:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the original is from here? https://github.com/google-research/google-research/tree/master/instruction_following_eval

@terrykong how should we include this? I can see why we may want to paste/copy it here since the googleresearch github is huge, but we probably need to be careful license wise?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOC, @abukharin-nv could you depend on https://github.com/EleutherAI/lm-evaluation-harness instead? looks like they copy in the right stuff and they also have other eval

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea. I will do that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay that it's copied since the google-research repo had a permissive license. If you think lm-eval-harness works for us, then it's probably a more convenient repo to depend on since we just need to "track" that one as a top level dependency.

Are there other things in lm-eval-harness that you think we'd eventually make use of? If not, let's just deal with the ifeval source

Signed-off-by: abukharin-nv <[email protected]>
print("Max retries reached. Could not download punkt_tab.")


download_punkt_tab()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this multiprocessing safe? curious if several environments import module, will it cause issues?

@@ -58,6 +58,8 @@ policy:
# training and logprob stages respectively.
dynamic_batching:
enabled: True
# train_mb_tokens: 2048
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove dead code?

}


def conflict_make(conflicts):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ensure_bidirectional_conflict a better name?

results = []
for response, prompt, metadata in zip(pred_responses, prompts, metadata):
try:
# with _mute_output():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# with _mute_output():

chunked_prompts = chunk_list_to_workers(prompts, self.num_workers)
chunked_metadata = chunk_list_to_workers(metadata, self.num_workers)

# # Process each chunk in parallel
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# # Process each chunk in parallel
# Process each chunk in parallel

Copy link
Collaborator

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please add tests?

packages = ["absl-py", "langdetect", "nltk==3.9.1", "immutabledict"]
# Run pip install for each package
for package in packages:
subprocess.run(["python3", "-m", "pip", "install", package])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you comment this stuff out? this isn't necessary in nemo-rl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants