Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on Normalization Methods for Non-Atari Benchmarks in Rliable Analysis #29

Open
amantuer opened this issue Nov 10, 2024 · 0 comments

Comments

@amantuer
Copy link

Hello Rliable team,

I’m using Rliable for analyzing reinforcement learning results on environments such as the DeepMind Control Suite and PyBullet, where human-normalized scores (as used in Atari benchmarks) are unavailable for recent algorithms like SARC. I noticed that in Atari experiments, human benchmarks are crucial to standardize scores across algorithms. Without equivalent baselines in DMC or PyBullet, I’m considering Z-score normalization and percentile normalization for comparing different RL algorithms.

Could you share any guidance on best practices for normalization in these cases or suggest other robust approaches that align with Rliable’s statistical rigor?

Thank you for the insightful tools and methodologies you provide!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant