You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m using Rliable for analyzing reinforcement learning results on environments such as the DeepMind Control Suite and PyBullet, where human-normalized scores (as used in Atari benchmarks) are unavailable for recent algorithms like SARC. I noticed that in Atari experiments, human benchmarks are crucial to standardize scores across algorithms. Without equivalent baselines in DMC or PyBullet, I’m considering Z-score normalization and percentile normalization for comparing different RL algorithms.
Could you share any guidance on best practices for normalization in these cases or suggest other robust approaches that align with Rliable’s statistical rigor?
Thank you for the insightful tools and methodologies you provide!
The text was updated successfully, but these errors were encountered:
Hello Rliable team,
I’m using Rliable for analyzing reinforcement learning results on environments such as the DeepMind Control Suite and PyBullet, where human-normalized scores (as used in Atari benchmarks) are unavailable for recent algorithms like SARC. I noticed that in Atari experiments, human benchmarks are crucial to standardize scores across algorithms. Without equivalent baselines in DMC or PyBullet, I’m considering Z-score normalization and percentile normalization for comparing different RL algorithms.
Could you share any guidance on best practices for normalization in these cases or suggest other robust approaches that align with Rliable’s statistical rigor?
Thank you for the insightful tools and methodologies you provide!
The text was updated successfully, but these errors were encountered: