Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update after terminal state #13

Open
ericmock opened this issue Nov 19, 2017 · 2 comments
Open

Update after terminal state #13

ericmock opened this issue Nov 19, 2017 · 2 comments

Comments

@ericmock
Copy link

I think there's a little bug in many of your scripts in that you update the returns for the last step with a post-terminal step. Thus, your value (policy) functions wind up growing (unbounded?) near the terminal state. For example, in rl2/mountaincar you have a "train" boolean but it is never set to false for the last step.

@lazyprogrammer
Copy link
Owner

Hmm.. I only found this train flag in one file (pg_theano) which was just a remnant from an old version (not being used). Could you elaborate on what you were referring to?

Actually there is an issue I found which is most scripts don't consider the value of the terminal state to be 0 (and hence the return is just the reward), but that doesn't sound like what you're referring to.

@ericmock
Copy link
Author

ericmock commented Feb 4, 2018

It's been awhile since I thought about this but I think my kludge fix of not updating on the last step is effectively (but not precisely) setting the value of the terminal state to be zero. Setting the value of the terminal state to zero will fix the fundamental issue that the value function grows to extremely large values (i.e. much much larger than the maximum possible reward).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants