Missing clipped value loss in PPO implementation #19

francelico · 2024-02-15T16:54:53Z

This codebase is great, thanks for the hard work! I've been using it to run baseline experiments in procgen, and I've noticed that your implementation of PPO does not use value loss clipping. However it is enabled by default in the Pytorch implementation that is most often encountered in papers testing agents in procgen.

Is there a reason why it was left out? I'm not super familiar with ALE, perhaps it is not as common there?

As part of my project I've created scripts to train and evaluate PPO in procgen* and I've implemented the DAAC agent (https://arxiv.org/abs/2102.10330). Would you like me to make a PR to include them to cleanba?

*On top of re-implementing value loss clipping in PPO I found minor differences between the atari and procgen environments, such as the info dict returned by envpool.step() being slightly different, and the videos in the eval script supporting grayscale images only.

vwxyzjn · 2024-02-16T02:29:00Z

Hi @francelico, thanks for the message. I turned it off because, in practice, it didn't seem to matter that much to the performance. As much as I'd love to have a DAAC agent in Cleanba, maybe not for now as this repo is mainly for distributed DRL stuff and kind of archived.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing clipped value loss in PPO implementation #19

Missing clipped value loss in PPO implementation #19

francelico commented Feb 15, 2024

vwxyzjn commented Feb 16, 2024

Missing clipped value loss in PPO implementation #19

Missing clipped value loss in PPO implementation #19

Comments

francelico commented Feb 15, 2024

vwxyzjn commented Feb 16, 2024