🎲 [GRPO] Shuffle mini batches #3391

edbeeching · 2025-04-29T14:36:52Z

This PR adds minibatch shuffling to ensure that the prompts are not ordered before the effective batch is split into chunks.

…inibatches

HuggingFaceDocBuilderDev · 2025-04-29T14:41:39Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

shirinyamani · 2025-04-29T16:41:08Z

Im sure this shuffling prompt. still guarantees that prompts-completions are aligned, right?

trl/trainer/grpo_trainer.py

qgallouedec

Looks good to me. To keep a detailed changelog, please either merge now to main (it should be possible), or wait for the other PR to be merged (but don't merge into the other branch)

lewtun

LGTM but would love to see shuffle_tensor_dict() unit tested since it's somewhat critical for training

edbeeching · 2025-04-30T07:01:28Z

LGTM but would love to see shuffle_tensor_dict() unit tested since it's somewhat critical for training

Sure, I will also add some tests for split tensor dict as I did not find any

Co-authored-by: Quentin Gallouédec <[email protected]>

…ibatches generated (#3396)

…l into grpo-shuffle-mini-batches

HaleyCH · 2025-05-30T06:27:37Z

This PR doesn't work correctly when my inputs contain non-Tensor inputs. Specifically, I'm passing a List[Tuple[int, int]] in the inputs for interpolation because torch.nn.functional.interpolate only supports size inputs of type int or Tuple[int] or Tuple[int, int] or Tuple[int, int, int].
Converting the size in inputs to a Tensor and then converting it back feels a bit less elegant for my use case. Do you have any plans to support inputs containing non-Tensor elements?🥰

qgallouedec · 2025-05-30T06:31:58Z

You work with a fork on your specific use-case, right? because _prepare_input isn't part of the public API

HaleyCH · 2025-05-30T07:02:19Z

You work with a fork on your specific use-case, right? because _prepare_input isn't part of the public API

Thanks for your reply. Yes, for the purpose of training a LISA-like model architecture using GRPO, I inherited trl.trainer.GRPOTrainer and overrode some of its functions.

[rank0]: Traceback (most recent call last):
[rank0]:   File "/raid0/workspace/a/reasoning-cod/rcod/train/train_lisa_grpo.py", line 133, in <module>
[rank0]:     trainer.train(resume_from_checkpoint=False)
[rank0]:   File "/raid0/workspace/a/reasoning-cod/submodules/transformers/src/transformers/trainer.py", line 2245, in train
[rank0]:     return inner_training_loop(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/raid0/workspace/a/reasoning-cod/submodules/transformers/src/transformers/trainer.py", line 2556, in _inner_training_loop
[rank0]:     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/raid0/workspace/a/reasoning-cod/submodules/transformers/src/transformers/trainer.py", line 3712, in training_step
[rank0]:     inputs = self._prepare_inputs(inputs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/raid0/workspace/a/reasoning-cod/submodules/trl/trl/extras/profiling.py", line 96, in wrapper
[rank0]:     return func(self, *args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/raid0/workspace/a/reasoning-cod/submodules/trl/trl/trainer/grpo_trainer.py", line 973, in _prepare_inputs
[rank0]:     generation_batch = shuffle_tensor_dict(generation_batch)
[rank0]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/raid0/workspace/a/reasoning-cod/submodules/trl/trl/trainer/grpo_trainer.py", line 247, in shuffle_tensor_dict
[rank0]:     return {key: tensor[permutation] if tensor is not None else None for key, tensor in tensor_dict.items()}
[rank0]:                  ~~~~~~^^^^^^^^^^^^^
[rank0]: TypeError: only integer tensors of a single element can be converted to an index

According to traceback，in trainer's training_step，_prepare_inputs is called by inputs = self._prepare_inputs(inputs), in which inputs contains None, Tensor and non-tensor items. Is this error caused by my use of inheritance?

edbeeching added 7 commits April 29, 2025 09:23

adds num_minibatches

7323f1e

precommit

62f30e1

fix comment

4652699

nits

0489572

add tensor shuffling

49abfcb

precommit

e5289cc

fixes a few lines where grad_acc should have been replaced with num_m…

0bc313f

…inibatches

edbeeching mentioned this pull request Apr 29, 2025

💔 [GRPO] Decouple gradient accumulation from the number of minibatches generated #3388

Merged

edbeeching requested review from lewtun, qgallouedec, kashif and shirinyamani April 29, 2025 14:54

Merge branch 'grpo-decouple-grad-acc' into grpo-shuffle-mini-batches

5af5729

kashif approved these changes Apr 29, 2025

View reviewed changes

shirinyamani approved these changes Apr 29, 2025

View reviewed changes

qgallouedec reviewed Apr 29, 2025

View reviewed changes

trl/trainer/grpo_trainer.py Outdated Show resolved Hide resolved

qgallouedec approved these changes Apr 29, 2025

View reviewed changes

lewtun approved these changes Apr 29, 2025

View reviewed changes

edbeeching and others added 9 commits April 30, 2025 11:40

refactor to generation_batch_size

141efb4

precommit

90e36e1

fix docstring

fba8511

Update trl/trainer/grpo_trainer.py

0df4734

Co-authored-by: Quentin Gallouédec <[email protected]>

Merge branch 'grpo-decouple-grad-acc' into grpo-shuffle-mini-batches

64f5abd

test

71e4095

Suggestions for Decouple gradient accumulation from the number of min…

fefe183

…ibatches generated (#3396)

rename steps per generation

75fc4af

Merge branch 'grpo-decouple-grad-acc' into grpo-shuffle-mini-batches

71fdd3f

Merge branch 'grpo-shuffle-mini-batches' of github.com:huggingface/tr…

602abf5

…l into grpo-shuffle-mini-batches

qgallouedec changed the title ~~[GRPO] Shuffle mini batches~~ 🎲 [GRPO] Shuffle mini batches May 6, 2025

Base automatically changed from grpo-decouple-grad-acc to main May 6, 2025 07:59

Merge branch 'main' into grpo-shuffle-mini-batches

4d10817

edbeeching merged commit adfa7fd into main May 6, 2025
10 checks passed

edbeeching deleted the grpo-shuffle-mini-batches branch May 6, 2025 09:09

hjh0119 mentioned this pull request May 22, 2025

[grpo] generation batch size & mini-batch update modelscope/ms-swift#4322

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🎲 [GRPO] Shuffle mini batches #3391

🎲 [GRPO] Shuffle mini batches #3391

Uh oh!

edbeeching commented Apr 29, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 29, 2025

Uh oh!

shirinyamani commented Apr 29, 2025

Uh oh!

Uh oh!

qgallouedec left a comment •

edited

Loading

Uh oh!

lewtun left a comment

Uh oh!

edbeeching commented Apr 30, 2025

Uh oh!

Uh oh!

HaleyCH commented May 30, 2025

Uh oh!

qgallouedec commented May 30, 2025

Uh oh!

HaleyCH commented May 30, 2025

Uh oh!

Uh oh!

🎲 [GRPO] Shuffle mini batches #3391

🎲 [GRPO] Shuffle mini batches #3391

Uh oh!

Conversation

edbeeching commented Apr 29, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 29, 2025

Uh oh!

shirinyamani commented Apr 29, 2025

Uh oh!

Uh oh!

qgallouedec left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

edbeeching commented Apr 30, 2025

Uh oh!

Uh oh!

HaleyCH commented May 30, 2025

Uh oh!

qgallouedec commented May 30, 2025

Uh oh!

HaleyCH commented May 30, 2025

Uh oh!

Uh oh!

qgallouedec left a comment •

edited

Loading