[DRRunner] corrected the negative learning rate in the schedule_function in Domain Randomisation Runner#10
Conversation
…nner, supply the schedule_fn to the optax optimiser chain
|
Hi @RobbenRibery, that Looking at |
|
Hi @minqi, thanks for your comment! I see your point. We can enforce something like Happy to run some experiments to see if annealing help further stablise the training. |
|
Hi @RobbenRibery, the default setting for We previously looked at linear annealing, but found it mostly hurt final policy performance on OOD tasks. |
|
Thanks, appreciated! |
|
Hi Minqi, @minqi, I also find that by setting the following: I could make the ACCEL runs deterministic at about 20% SPS compared to the non-deterministic runs. Otherwise, even if every RNG split is set correctly, I could still get different results. |

In the reset(self, rng) method, the learning rate seems negative as initially specified.
This triggers the learning to break down completely. After turning it into a positive value, pass the scheduler into the optax chain (see line 153). ACCEL achieves generalisation on OOD envs [ref: WANDB attached]