Stability issues #4

rubenftech · 2024-04-16T16:30:15Z

Hi,

Thank you for the amazing work!

While experimenting with your code, despite running the training multiple times, we're observing stability issues. Here is an example of one of the rew_total graphs:

Is this behavior expected or indicative of an underlying problem? Is the maximum total reward achieved here (around 350) the same as you got? Additionally, if you could share the graphs from one of your runs it might help us to track down the issue and understand the expected behavior.

Thanks!

YandongJi · 2024-04-16T17:02:07Z

Thanks for bringing up the issue! Actually we never tried to train it for 500k, usually 50k at most. As for the curve around 50k, it looks very similar to my curve. The reward scales should be tuned better to make graph look more stable. I can try to tune it in recent days. But can you also evaluate the policy? The policy should usually be performing ok. FYI this work uses the same reward scale and looks like they can have similar results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stability issues #4

Stability issues #4

rubenftech commented Apr 16, 2024

YandongJi commented Apr 16, 2024 •

edited

Loading

Stability issues #4

Stability issues #4

Comments

rubenftech commented Apr 16, 2024

YandongJi commented Apr 16, 2024 • edited Loading

YandongJi commented Apr 16, 2024 •

edited

Loading