Does the walker run reproduce correctly? #4

letusfly85 · 2020-09-12T01:37:18Z

Hi, thank you for the cool repository!

I tried several tasks walker walk, cheetah run. They seem to work fine.

But when I run walker run, the episode_reward cannot achieve around 700.
Is there any problem...? 🤔

The original paper seems to say walker run will achieve around 700 with 1M steps, saying on the following page 7, Figure7.

https://arxiv.org/pdf/1912.01603.pdf

Thank you.

The text was updated successfully, but these errors were encountered:

yusukeurakami · 2022-06-20T10:31:29Z

I know it is too late to comment on this issue but could you tell me the hyper parameters you tried this experiment?

coderlemon17 · 2023-03-06T08:07:40Z

@yusukeurakami Hi, I also find that the results for walker-run are weird, I ran the experiment with 5 different seeds, and here's what I got:

And the hyperparameters I use are:

Hyperparameters

action_noise: 0.3
action_repeat: 2
actor_lr: 8.0e-05
adam_epsilon: 1.0e-07
algo: dreamer
batch_size: 50
belief_size: 200
bit_depth: 5
candidates: 1000
checkpoint_interval: 50
chunk_size: 50
cnn_activation_function: relu
collect_interval: 100
comment: ''
config: dm_control/dreamer/walker-run.yaml
dense_activation_function: elu
device: cuda:3
embedding_size: 1024
env: walker-run
episodes: 1000
exp_ckpt: ''
experience_size: 1000000
free_nats: 3
gamma: 0.99
global_kl_beta: 0.0
grad_clip_norm: 100.0
hidden_size: 200
id: dreamer
lambda_: 0.95
max_episode_length: 1000
model_ckpt: ''
model_lr: 0.001
model_lr_schedule: 0
optimisation_iters: 10
overshooting_distance: 50
overshooting_kl_beta: 0.0
overshooting_reward_scale: 0.0
planning_horizon: 15
render: false
save_experience_buffer: false
seed: 0
seed_episodes: 5
state_size: 30
symbolic_env: false
test: false
test_episodes: 10
test_interval: 10
top_candidates: 100
torch_deterministic: true
value_lr: 8.0e-05
worldmodel_LogProbLoss: false

yingchengyang · 2023-03-06T08:43:56Z

The same question. Hoping for your reply. Thanks.

sumwailiu · 2025-01-08T06:03:43Z

I think the root cause is that there are some minor mistakes in the reward computation (inspired by the Issue #5). After fixing them (refers to the pull request #12), I found walker-run task is reproduced correctly.

coderlemon17 · 2025-01-08T06:26:48Z

I think the root cause is that there are some minor mistakes in the reward computation (inspired by the Issue #5). After fixing them (refers to the pull request #12), I found walker-run task is reproduced correctly.

Hi, thanks for the explanation. However, I'm a little confused about the fix.
Assume the sequence is $(s_1, a_1, r_1, \cdots)$. To my understanding, you are trying to predict $r_t$ with $(s_t, h_t)$, i.e. $(s_{\leq t}, a_{<t})$. However, you need $(s_{\leq t+1}, a_{<t+1})$ to predict $r_t$, is that correct?

sumwailiu · 2025-01-08T14:15:50Z

Hi, thanks for the explanation. However, I'm a little confused about the fix. Assume the sequence is ( s 1 , a 1 , r 1 , ⋯ ) . To my understanding, you are trying to predict r t with ( s t , h t ) , i.e. ( s ≤ t , a < t ) . However, you need ( s ≤ t + 1 , a < t + 1 ) to predict r t , is that correct?

It is correct that I try to predict r_t with (s_t, h_t), and I didn't use (s_t+1, a_t) to predict r_t.

Acutally, the origin implementation of dreamer-torch (i.e., reward_loss = -reward_dist.log_prob(rewards[:-1]).mean(dim=(0, 1))) means that it tries to predict r_t-1 with (s_t, h_t). In other words, it needs (s≤t+1, a<t+1) to predict r_t.

yusukeurakami · 2025-01-19T03:15:47Z

Thanks @sumwailiu for your PR
Let's me test on my side with your PR. I need to remember the details first. I forgot how its been implemented on the paper and original repo

sumwailiu · 2025-02-15T03:52:25Z

@coderlemon17 @yusukeurakami There are some mistakes in the first version of the problem fix (#12), where the reward computation is indeed correct. Please refer to the second version (#13).

sumwailiu referenced this issue Jan 8, 2025

reward computation fix

4694dc5

sumwailiu mentioned this issue Jan 8, 2025

Fix the reward computation problem, step counting logic and update out-of-date dependencies (problem fix V1) #12

Open

sumwailiu mentioned this issue Jan 8, 2025

Reward loss timescale #5

Open

sumwailiu mentioned this issue Feb 15, 2025

Problem fix [v2] #13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does the walker run reproduce correctly? #4

Does the walker run reproduce correctly? #4

letusfly85 commented Sep 12, 2020

yusukeurakami commented Jun 20, 2022

coderlemon17 commented Mar 6, 2023

yingchengyang commented Mar 6, 2023

sumwailiu commented Jan 8, 2025 •

edited

Loading

coderlemon17 commented Jan 8, 2025

sumwailiu commented Jan 8, 2025 •

edited

Loading

yusukeurakami commented Jan 19, 2025

sumwailiu commented Feb 15, 2025

Does the walker run reproduce correctly? #4

Does the walker run reproduce correctly? #4

Comments

letusfly85 commented Sep 12, 2020

yusukeurakami commented Jun 20, 2022

coderlemon17 commented Mar 6, 2023

yingchengyang commented Mar 6, 2023

sumwailiu commented Jan 8, 2025 • edited Loading

coderlemon17 commented Jan 8, 2025

sumwailiu commented Jan 8, 2025 • edited Loading

yusukeurakami commented Jan 19, 2025

sumwailiu commented Feb 15, 2025

sumwailiu commented Jan 8, 2025 •

edited

Loading

sumwailiu commented Jan 8, 2025 •

edited

Loading