Fix the reward computation problem, step counting logic and update out-of-date dependencies (problem fix V1) #12
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Main changes:
Some operations for reward-related variables are different from the Dreamer (tensorflow2 implementation) implemented by the origin author. By fixing them, I found that the Issue Does the walker run reproduce correctly? is solved, where my origin result is "episodes: 1000, total_steps: 501505, train_reward: 376.6" for the walk-run task, and the fixed result is "episodes: 1000, total_steps: 501505, train_reward: 637.5". The test reward is as follows:

This well match the result from the origin paper.
The step counting logic is not consistent, where the
metrics['steps']
is computed byt * args.action_repeat
during experience buffer initialization and byt
during data collection. Besides, the start counting position during data collection should be 1 instead of 0. These problems make the finaltotal_steps
be 501505 = 5 (random seed episodes) * 1000 (computed byt * args.action_repeat
) + 995 (the remeaning episodes) * 499 (computed byt
starting from 0), which is confusing. I unify the counting rules intot * args.action_repeat
to makemetrics['steps']
represent the number of env interactions, and correct the start counting position for each episode during data collection.Other changes:
model_learning-rate
is modified according to the HYPER PARAMETERS appendix of the origin paper Dream to Control: Learning Behaviors By latent Imagination.env.py
is updated to support "quadruped-run" task.