-
Notifications
You must be signed in to change notification settings - Fork 25
feat: Add IFEval Environment #412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: geshen/dset
Are you sure you want to change the base?
Conversation
Signed-off-by: abukharin-nv <[email protected]>
examples/configs/grpo_math_1B.yaml
Outdated
@@ -58,8 +58,8 @@ policy: | |||
# training and logprob stages respectively. | |||
dynamic_batching: | |||
enabled: True | |||
train_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.train_micro_batch_size}} | |||
logprob_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.logprob_batch_size}} | |||
train_mb_tokens: 2048 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason for this change?
examples/configs/grpo_math_1B.yaml
Outdated
@@ -126,6 +126,8 @@ data: | |||
env: | |||
math: | |||
num_workers: 8 | |||
ifeval: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be moved to a derivative config?
|
||
|
||
@ray.remote | ||
class HFVerifyWorker: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class HFVerifyWorker: | |
class IFVerifyWorker: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit)
|
||
download_punkt_tab() | ||
|
||
WORD_LIST = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@terrykong do you have an opinion on making this sort of 'data' list be in a file that's not .py? It being in this file might make searching harder (i frequently do search with *.py filter).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm indifferent. Okay as data or in python
Can we add the requirements for ifeval uv pip install absl-py langdetect nltk==3.9.1 immutabledict \
&& uv run examples/run_grpo_ifeval.py \ to the pyproject in a separate group? Then use the uv environment method to run the ifeval workers with the correct deps? It's for this purpose. |
Signed-off-by: abukharin-nv <[email protected]>
pyproject.toml
Outdated
@@ -67,6 +67,12 @@ test = [ | |||
"pytest-timeout", | |||
"pytest-cov", | |||
] | |||
ifeval = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be moved to the [project.optional-dependencies] (like vllm above)?
Then, when you want to run an ifeval environment/need it's deps, you should run that ray remote with a PY_EXECUTABLE that points to that optional dep. Like the following for VLLM:
-
specify environment with --extra ifeval
https://github.com/NVIDIA/NeMo-RL/blob/main/nemo_rl/distributed/virtual_cluster.py#L46 -
launch worker with this py_executable
https://github.com/NVIDIA/NeMo-RL/blob/main/nemo_rl/models/generation/vllm.py#L52
(your sub-workers will also need it. See ifeval_environment L147)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the original is from here? https://github.com/google-research/google-research/tree/master/instruction_following_eval
@terrykong how should we include this? I can see why we may want to paste/copy it here since the googleresearch github is huge, but we probably need to be careful license wise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOC, @abukharin-nv could you depend on https://github.com/EleutherAI/lm-evaluation-harness instead? looks like they copy in the right stuff and they also have other eval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea. I will do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@terrykong looks like there may be the same issue copying from EleutherAI: https://github.com/EleutherAI/lm-evaluation-harness/blob/29ea6832cd913b055ec1d6962180c773e8a7ac88/lm_eval/tasks/ifeval/instructions_registry.py#L1
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay that it's copied since the google-research repo had a permissive license. If you think lm-eval-harness works for us, then it's probably a more convenient repo to depend on since we just need to "track" that one as a top level dependency.
Are there other things in lm-eval-harness that you think we'd eventually make use of? If not, let's just deal with the ifeval source
Signed-off-by: abukharin-nv <[email protected]>
print("Max retries reached. Could not download punkt_tab.") | ||
|
||
|
||
download_punkt_tab() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this multiprocessing safe? curious if several environments import module, will it cause issues?
examples/configs/grpo_math_1B.yaml
Outdated
@@ -58,6 +58,8 @@ policy: | |||
# training and logprob stages respectively. | |||
dynamic_batching: | |||
enabled: True | |||
# train_mb_tokens: 2048 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove dead code?
} | ||
|
||
|
||
def conflict_make(conflicts): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is ensure_bidirectional_conflict
a better name?
results = [] | ||
for response, prompt, metadata in zip(pred_responses, prompts, metadata): | ||
try: | ||
# with _mute_output(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# with _mute_output(): |
chunked_prompts = chunk_list_to_workers(prompts, self.num_workers) | ||
chunked_metadata = chunk_list_to_workers(metadata, self.num_workers) | ||
|
||
# # Process each chunk in parallel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# # Process each chunk in parallel | |
# Process each chunk in parallel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please add tests?
packages = ["absl-py", "langdetect", "nltk==3.9.1", "immutabledict"] | ||
# Run pip install for each package | ||
for package in packages: | ||
subprocess.run(["python3", "-m", "pip", "install", package]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you comment this stuff out? this isn't necessary in nemo-rl
What does this PR do ?
Adds IFEval environment and script.
Issues
List issues that this PR closes (syntax):
Usage
uv pip install absl-py langdetect nltk==3.9.1 immutabledict \ && uv run examples/run_grpo_ifeval.py \
Before your PR is "Ready for review"
Pre checks:
Additional Information