Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing clipped value loss in PPO implementation #19

Open
francelico opened this issue Feb 15, 2024 · 1 comment
Open

Missing clipped value loss in PPO implementation #19

francelico opened this issue Feb 15, 2024 · 1 comment

Comments

@francelico
Copy link

Hi @vwxyzjn ,

This codebase is great, thanks for the hard work! I've been using it to run baseline experiments in procgen, and I've noticed that your implementation of PPO does not use value loss clipping. However it is enabled by default in the Pytorch implementation that is most often encountered in papers testing agents in procgen.

Is there a reason why it was left out? I'm not super familiar with ALE, perhaps it is not as common there?

As part of my project I've created scripts to train and evaluate PPO in procgen* and I've implemented the DAAC agent (https://arxiv.org/abs/2102.10330). Would you like me to make a PR to include them to cleanba?

*On top of re-implementing value loss clipping in PPO I found minor differences between the atari and procgen environments, such as the info dict returned by envpool.step() being slightly different, and the videos in the eval script supporting grayscale images only.

@vwxyzjn
Copy link
Owner

vwxyzjn commented Feb 16, 2024

Hi @francelico, thanks for the message. I turned it off because, in practice, it didn't seem to matter that much to the performance. As much as I'd love to have a DAAC agent in Cleanba, maybe not for now as this repo is mainly for distributed DRL stuff and kind of archived.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants