Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRRunner] corrected the negative learning rate in the schedule_function in Domain Randomisation Runner #10

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

RobbenRibery
Copy link

In the reset(self, rng) method, the learning rate seems negative as initially specified.
This triggers the learning to break down completely. After turning it into a positive value, pass the scheduler into the optax chain (see line 153). ACCEL achieves generalisation on OOD envs [ref: WANDB attached]

Screenshot 2024-08-24 at 19 21 20

…nner, supply the schedule_fn to the optax optimiser chain
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 24, 2024
@minqi
Copy link
Contributor

minqi commented Aug 24, 2024

Hi @RobbenRibery, that schedule_fn is a leftover from code we did not use in our experiments (the original L153 in your diff uses float(self.lr without the negative sign.)

Looking at optax.linear_schedule it looks like your change should default correctly to a constant function returning the initial learning rate if self.anneal_steps == 0, so I think this is safe to merge. @samvelyan

@RobbenRibery
Copy link
Author

Hi @minqi, thanks for your comment! I see your point. We can enforce something like self.anneal_steps == 0 or self.lr_final == self.lr

Happy to run some experiments to see if annealing help further stablise the training.

@minqi
Copy link
Contributor

minqi commented Aug 24, 2024

Hi @RobbenRibery, the default setting for self.anneal_steps is 0, and for self.lr_final it is None, in which case it defaults to the same value as self.lr, so no changes there are necessary.

We previously looked at linear annealing, but found it mostly hurt final policy performance on OOD tasks.

@RobbenRibery
Copy link
Author

Thanks, appreciated!

@RobbenRibery
Copy link
Author

Hi Minqi, @minqi, I also find that by setting the following:

export XLA_FLAGS='--xla_gpu_deterministic_ops=true --xla_gpu_autotune_level=0' 
export TF_DETERMINISTIC_OPS=1
python -m minimax.train -- ..... 

I could make the ACCEL runs deterministic at about 20% SPS compared to the non-deterministic runs. Otherwise, even if every RNG split is set correctly, I could still get different results.

ref wand attached:
Screenshot 2024-08-31 at 14 53 43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants