Fix the reward computation problem, step counting logic and update out-of-date dependencies (problem fix V1) #12

sumwailiu · 2025-01-08T05:54:17Z

Main changes:

Some operations for reward-related variables are different from the Dreamer (tensorflow2 implementation) implemented by the origin author. By fixing them, I found that the Issue Does the walker run reproduce correctly? is solved, where my origin result is "episodes: 1000, total_steps: 501505, train_reward: 376.6" for the walk-run task, and the fixed result is "episodes: 1000, total_steps: 501505, train_reward: 637.5". The test reward is as follows:

This well match the result from the origin paper.
The step counting logic is not consistent, where the metrics['steps'] is computed by t * args.action_repeat during experience buffer initialization and by t during data collection. Besides, the start counting position during data collection should be 1 instead of 0. These problems make the final total_steps be 501505 = 5 (random seed episodes) * 1000 (computed by t * args.action_repeat) + 995 (the remeaning episodes) * 499 (computed by t starting from 0), which is confusing. I unify the counting rules into t * args.action_repeat to make metrics['steps'] represent the number of env interactions, and correct the start counting position for each episode during data collection.

Other changes:

The default model_learning-rate is modified according to the HYPER PARAMETERS appendix of the origin paper Dream to Control: Learning Behaviors By latent Imagination.
I found that the origin versions of torch, torchaudio and torchvision can not be installed. So I updated these packages, which works.
env.py is updated to support "quadruped-run" task.

sumwailiu added 3 commits January 8, 2025 13:00

Update env.py

4b16954

reward computation fix

4694dc5

torch package update

2459169

sumwailiu mentioned this pull request Jan 8, 2025

Does the walker run reproduce correctly? #4

Open

step counting consistency

9f20e5a

sumwailiu changed the title ~~Fix the reward computation problems and update out-of-date dependencies~~ Fix the reward computation problem as well as step counting logic and update out-of-date dependencies Jan 8, 2025

sumwailiu changed the title ~~Fix the reward computation problem as well as step counting logic and update out-of-date dependencies~~ Fix the reward computation problem, step counting logic and update out-of-date dependencies Jan 8, 2025

sumwailiu mentioned this pull request Jan 8, 2025

Reward loss timescale #5

Open

sumwailiu mentioned this pull request Feb 15, 2025

Problem fix [v2] #13

Open

sumwailiu changed the title ~~Fix the reward computation problem, step counting logic and update out-of-date dependencies~~ Fix the reward computation problem, step counting logic and update out-of-date dependencies (problem fix V1) Feb 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the reward computation problem, step counting logic and update out-of-date dependencies (problem fix V1) #12

Fix the reward computation problem, step counting logic and update out-of-date dependencies (problem fix V1) #12

sumwailiu commented Jan 8, 2025 •

edited

Loading

Fix the reward computation problem, step counting logic and update out-of-date dependencies (problem fix V1) #12

Are you sure you want to change the base?

Fix the reward computation problem, step counting logic and update out-of-date dependencies (problem fix V1) #12

Conversation

sumwailiu commented Jan 8, 2025 • edited Loading

sumwailiu commented Jan 8, 2025 •

edited

Loading