[DRRunner] corrected the negative learning rate in the schedule_function in Domain Randomisation Runner #10

RobbenRibery · 2024-08-24T18:21:50Z

In the reset(self, rng) method, the learning rate seems negative as initially specified.
This triggers the learning to break down completely. After turning it into a positive value, pass the scheduler into the optax chain (see line 153). ACCEL achieves generalisation on OOD envs [ref: WANDB attached]

…nner, supply the schedule_fn to the optax optimiser chain

minqi · 2024-08-24T18:41:24Z

Hi @RobbenRibery, that schedule_fn is a leftover from code we did not use in our experiments (the original L153 in your diff uses float(self.lr without the negative sign.)

Looking at optax.linear_schedule it looks like your change should default correctly to a constant function returning the initial learning rate if self.anneal_steps == 0, so I think this is safe to merge. @samvelyan

RobbenRibery · 2024-08-24T19:14:24Z

Hi @minqi, thanks for your comment! I see your point. We can enforce something like self.anneal_steps == 0 or self.lr_final == self.lr

Happy to run some experiments to see if annealing help further stablise the training.

minqi · 2024-08-24T19:24:04Z

Hi @RobbenRibery, the default setting for self.anneal_steps is 0, and for self.lr_final it is None, in which case it defaults to the same value as self.lr, so no changes there are necessary.

We previously looked at linear annealing, but found it mostly hurt final policy performance on OOD tasks.

RobbenRibery · 2024-08-24T19:53:34Z

Thanks, appreciated!

RobbenRibery · 2024-08-31T13:55:23Z

Hi Minqi, @minqi, I also find that by setting the following:

export XLA_FLAGS='--xla_gpu_deterministic_ops=true --xla_gpu_autotune_level=0' 
export TF_DETERMINISTIC_OPS=1
python -m minimax.train -- .....

I could make the ACCEL runs deterministic at about 20% SPS compared to the non-deterministic runs. Otherwise, even if every RNG split is set correctly, I could still get different results.

ref wand attached:

corrected the negative learning rate in the schedule_function in DRRu…

452bc86

…nner, supply the schedule_fn to the optax optimiser chain

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 24, 2024

RobbenRibery force-pushed the main branch from 03d6223 to 452bc86 Compare August 24, 2024 19:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DRRunner] corrected the negative learning rate in the schedule_function in Domain Randomisation Runner #10

[DRRunner] corrected the negative learning rate in the schedule_function in Domain Randomisation Runner #10

Uh oh!

RobbenRibery commented Aug 24, 2024

Uh oh!

minqi commented Aug 24, 2024 •

edited

Loading

Uh oh!

RobbenRibery commented Aug 24, 2024

Uh oh!

minqi commented Aug 24, 2024

Uh oh!

RobbenRibery commented Aug 24, 2024

Uh oh!

RobbenRibery commented Aug 31, 2024

Uh oh!

Uh oh!

[DRRunner] corrected the negative learning rate in the schedule_function in Domain Randomisation Runner #10

Are you sure you want to change the base?

[DRRunner] corrected the negative learning rate in the schedule_function in Domain Randomisation Runner #10

Uh oh!

Conversation

RobbenRibery commented Aug 24, 2024

Uh oh!

minqi commented Aug 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RobbenRibery commented Aug 24, 2024

Uh oh!

minqi commented Aug 24, 2024

Uh oh!

RobbenRibery commented Aug 24, 2024

Uh oh!

RobbenRibery commented Aug 31, 2024

Uh oh!

Uh oh!

minqi commented Aug 24, 2024 •

edited

Loading