Skip to content

[DRRunner] corrected the negative learning rate in the schedule_function in Domain Randomisation Runner#10

Open
RobbenRibery wants to merge 1 commit into
facebookresearch:mainfrom
RobbenRibery:main
Open

[DRRunner] corrected the negative learning rate in the schedule_function in Domain Randomisation Runner#10
RobbenRibery wants to merge 1 commit into
facebookresearch:mainfrom
RobbenRibery:main

Conversation

@RobbenRibery

Copy link
Copy Markdown

In the reset(self, rng) method, the learning rate seems negative as initially specified.
This triggers the learning to break down completely. After turning it into a positive value, pass the scheduler into the optax chain (see line 153). ACCEL achieves generalisation on OOD envs [ref: WANDB attached]

Screenshot 2024-08-24 at 19 21 20

…nner, supply the schedule_fn to the optax optimiser chain
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 24, 2024
@minqi

minqi commented Aug 24, 2024

Copy link
Copy Markdown
Contributor

Hi @RobbenRibery, that schedule_fn is a leftover from code we did not use in our experiments (the original L153 in your diff uses float(self.lr without the negative sign.)

Looking at optax.linear_schedule it looks like your change should default correctly to a constant function returning the initial learning rate if self.anneal_steps == 0, so I think this is safe to merge. @samvelyan

@RobbenRibery

Copy link
Copy Markdown
Author

Hi @minqi, thanks for your comment! I see your point. We can enforce something like self.anneal_steps == 0 or self.lr_final == self.lr

Happy to run some experiments to see if annealing help further stablise the training.

@minqi

minqi commented Aug 24, 2024

Copy link
Copy Markdown
Contributor

Hi @RobbenRibery, the default setting for self.anneal_steps is 0, and for self.lr_final it is None, in which case it defaults to the same value as self.lr, so no changes there are necessary.

We previously looked at linear annealing, but found it mostly hurt final policy performance on OOD tasks.

@RobbenRibery

Copy link
Copy Markdown
Author

Thanks, appreciated!

@RobbenRibery

Copy link
Copy Markdown
Author

Hi Minqi, @minqi, I also find that by setting the following:

export XLA_FLAGS='--xla_gpu_deterministic_ops=true --xla_gpu_autotune_level=0' 
export TF_DETERMINISTIC_OPS=1
python -m minimax.train -- ..... 

I could make the ACCEL runs deterministic at about 20% SPS compared to the non-deterministic runs. Otherwise, even if every RNG split is set correctly, I could still get different results.

ref wand attached:
Screenshot 2024-08-31 at 14 53 43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants