The model does not converge for breakout #211

yungangwu · 2022-10-20T02:58:33Z

Search before asking

I have searched the MuZero issues and found no similar feature requests.

Description

I trained muzero for breakout with the hyperparameters given in the code, but up to 450,000 steps, its reward was still 0 and showed no convergence. So I would like to ask, are the hyperparameters in the code validated hyperparameters? Thank, you!

Additional context

No response

JohnPPP · 2022-10-20T06:43:06Z

Same issue here, but for all envs. A quinta, 20/10/2022, 03:58, yungangwu ***@***.***> escreveu:

…

Search before asking - I have searched the MuZero issues <https://github.com/werner-duvaud/muzero-general/issues> and found no similar feature requests. Description I trained muzero for breakout with the hyperparameters given in the code, but up to 450,000 steps, its reward was still 0 and showed no convergence. So I would like to ask, are the hyperparameters in the code validated hyperparameters? Thank, you! Additional context *No response* — Reply to this email directly, view it on GitHub <#211>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACPAYROELDTJJHTULUPDSF3WECYOLANCNFSM6AAAAAARJWGUG4> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

yungangwu · 2022-10-20T06:46:57Z

Have you tried any other parameter Settings? For example, if batch_size is set to 1024, does the model converge under certain hyperparameter Settings? @JohnPPP

JohnPPP · 2022-10-20T08:25:01Z

Tried a bunch of hyperparameters on a bunch of games. Just wasted my time. Perhaps others can show me how can this work... A quinta, 20/10/2022, 07:47, yungangwu ***@***.***> escreveu:

…

Have you tried any other parameter Settings? For example, if batch_size is set to 1024, does the model converge under certain hyperparameter Settings? — Reply to this email directly, view it on GitHub <#211 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACPAYRLVXSFUSRWTKDIO5VTWEDTGZANCNFSM6AAAAAARJWGUG4> . You are receiving this because you commented.Message ID: ***@***.***>

yungangwu · 2022-10-20T08:31:27Z

gg. I also met the same problem, did a lot of experiments, but nothing happened, I don't know if there is a mistake in the code. @JohnPPP

JohnPPP · 2022-10-20T11:21:09Z

Yeah, probably is. A quinta, 20/10/2022, 09:31, yungangwu ***@***.***> escreveu:

…

gg. I also met the same problem, did a lot of experiments, but nothing happened, I don't know if there is a mistake in the code. @JohnPPP <https://github.com/JohnPPP> — Reply to this email directly, view it on GitHub <#211 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACPAYRLWXHNWUUNPJQ4MWQLWED7OVANCNFSM6AAAAAARJWGUG4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

dillonmsandhu · 2022-10-31T20:39:13Z

Did the reward stay zero the entire time, or did it occasionally get some reward? I have it working on cartpole, but not on Atari. That said, it still gets a reward of 2 or 3 occasionally in breakout, indicating that it is behaving randomly.

zsn2021 · 2022-12-31T15:25:16Z

I also encountered the same problem. I adjusted the super parameters for a long time, but I couldn't learn a good effect in my environment

yungangwu · 2022-12-31T15:28:20Z

Yes, I have this problem. I also experimented with another code, muzero-pytorch, on gomoku games, but I adjusted for a long time and didn't get the ideal results.

…

---Original--- From: ***@***.***> Date: Sat, Dec 31, 2022 23:25 PM To: ***@***.***>; Cc: ***@***.***>;"State ***@***.***>; Subject: Re: [werner-duvaud/muzero-general] The model does not converge forbreakout (Issue #211) I also encountered the same problem. I adjusted the super parameters for a long time, but I couldn't learn a good effect in my environment — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: ***@***.***>

zsn2021 · 2022-12-31T15:33:34Z

Is there a possibility that many networks need to be learned, leading to decision failure.
If you can, you can add a contact information and we can communicate privately

yungangwu · 2022-12-31T15:38:55Z

Yes, that's why I guess, probably because it has three series networks need to optimize together, so very careful training to converge. As far as contact information, I'm using the wechat app. Do you know this app?

zsn2021 · 2022-12-31T15:40:50Z

您可以加我的微信联系方式
13162062294

yungangwu added the enhancement New feature or request label Oct 20, 2022

yungangwu closed this as completed Oct 20, 2022

yungangwu reopened this Oct 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The model does not converge for breakout #211

The model does not converge for breakout #211

yungangwu commented Oct 20, 2022

JohnPPP commented Oct 20, 2022 via email

yungangwu commented Oct 20, 2022 •

edited

Loading

JohnPPP commented Oct 20, 2022 via email

yungangwu commented Oct 20, 2022

JohnPPP commented Oct 20, 2022 via email

dillonmsandhu commented Oct 31, 2022

zsn2021 commented Dec 31, 2022

yungangwu commented Dec 31, 2022 via email

zsn2021 commented Dec 31, 2022

yungangwu commented Dec 31, 2022 via email

zsn2021 commented Dec 31, 2022

The model does not converge for breakout #211

The model does not converge for breakout #211

Comments

yungangwu commented Oct 20, 2022

Search before asking

Description

Additional context

JohnPPP commented Oct 20, 2022 via email

yungangwu commented Oct 20, 2022 • edited Loading

JohnPPP commented Oct 20, 2022 via email

yungangwu commented Oct 20, 2022

JohnPPP commented Oct 20, 2022 via email

dillonmsandhu commented Oct 31, 2022

zsn2021 commented Dec 31, 2022

yungangwu commented Dec 31, 2022 via email

zsn2021 commented Dec 31, 2022

yungangwu commented Dec 31, 2022 via email

zsn2021 commented Dec 31, 2022

yungangwu commented Oct 20, 2022 •

edited

Loading