Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Same key used for each state during training of full rainbow #222

Open
mmcaulif opened this issue Sep 12, 2024 · 0 comments
Open

Same key used for each state during training of full rainbow #222

mmcaulif opened this issue Sep 12, 2024 · 0 comments

Comments

@mmcaulif
Copy link

mmcaulif commented Sep 12, 2024

Hi, while benchmarking my own implementation of Rainbow I had been using the dopamine full version as a reference, however I noticed that in the get_logits and get_q_values functions in line 95 and 100 respectively their is no axis mapped over the key, and later on in the train script only a single key is split for each of the forward passes.

I am just curious, if given the use of Noisy Networks in the full rainbow algorithm, is this implementation detail intentional? As it will lead to the same noisy parameters for each statement and thus a higher bias to the gradient updates. From my understanding the alternative is to split the keys per forward pass and for each input, which will introduce some additional computation. I understand this might additional bias be negligible compared to the computation saved but am interested to hear :)

Thanks for your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant