Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the reward computation problem, step counting logic and update out-of-date dependencies (problem fix V1) #12

Open
wants to merge 4 commits into
base: dreamer-torch1.8.2
Choose a base branch
from

Conversation

sumwailiu
Copy link

@sumwailiu sumwailiu commented Jan 8, 2025

Main changes:

  • Some operations for reward-related variables are different from the Dreamer (tensorflow2 implementation) implemented by the origin author. By fixing them, I found that the Issue Does the walker run reproduce correctly? is solved, where my origin result is "episodes: 1000, total_steps: 501505, train_reward: 376.6" for the walk-run task, and the fixed result is "episodes: 1000, total_steps: 501505, train_reward: 637.5". The test reward is as follows:
    newplot
    This well match the result from the origin paper.

  • The step counting logic is not consistent, where the metrics['steps'] is computed by t * args.action_repeat during experience buffer initialization and by t during data collection. Besides, the start counting position during data collection should be 1 instead of 0. These problems make the final total_steps be 501505 = 5 (random seed episodes) * 1000 (computed by t * args.action_repeat) + 995 (the remeaning episodes) * 499 (computed by t starting from 0), which is confusing. I unify the counting rules into t * args.action_repeat to make metrics['steps'] represent the number of env interactions, and correct the start counting position for each episode during data collection.

Other changes:

  • The default model_learning-rate is modified according to the HYPER PARAMETERS appendix of the origin paper Dream to Control: Learning Behaviors By latent Imagination.
  • I found that the origin versions of torch, torchaudio and torchvision can not be installed. So I updated these packages, which works.
  • env.py is updated to support "quadruped-run" task.

@sumwailiu sumwailiu changed the title Fix the reward computation problems and update out-of-date dependencies Fix the reward computation problem as well as step counting logic and update out-of-date dependencies Jan 8, 2025
@sumwailiu sumwailiu changed the title Fix the reward computation problem as well as step counting logic and update out-of-date dependencies Fix the reward computation problem, step counting logic and update out-of-date dependencies Jan 8, 2025
@sumwailiu sumwailiu mentioned this pull request Jan 8, 2025
@sumwailiu sumwailiu mentioned this pull request Feb 15, 2025
@sumwailiu sumwailiu changed the title Fix the reward computation problem, step counting logic and update out-of-date dependencies Fix the reward computation problem, step counting logic and update out-of-date dependencies (problem fix V1) Feb 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant