-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Policy iteration solution only show 1 optimal solution #212
Comments
The Probabilty Distribution for end game must be 0. |
def policy_eval(policy, env,V = np.zeros(env.nS), discount_factor=1.0, theta=0.00001):
def policy_improvement(env, policy_eval_fn=policy_eval, discount_factor=1.0):
|
Hey guy, I just start learning RL, so this is more like a question.
I notice that in the slide/lecture, the gridworld problem have more than one optimal solution. But in the DP>policy_iteration_solution.ipynb, you used argmax() to choose only 1 action for each state (while there could be 2 actions with the same max value)
=> The final answer is still correct, but the "Policy Probability Distribution" shown is kind of bugging me.
Should this be changed?
Btw, here is my code
And the "Policy Probability Distribution" should look like this:
The text was updated successfully, but these errors were encountered: