You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Monte Carlo AssertionError: defaultdict(<function mc_control_importance_sampling.<locals>.<lambda> at 0x7f31699ffe18>, {}) (<class 'collections.defaultdict'>)
#231
But i am having some issues with my helper functions for the Monte Carlo methods
def mc_control_importance_sampling(env, num_episodes, discount = .99):
"""
Monte Carlo Control Off-Policy Control using Weights for Sampling.
Finds an optimal greedy policy.
"""
# creates Q dictionary that maps obs to action values
Q = defaultdict(lambda: np.zeros(env.action_space))
#dictionary for weights
C = defaultdict(lambda: np.zeros(env.action_space))
# learn greedy policy
target_policy = env.step(Q)
for i_episode in range(1, num_episodes + 1):
if i_episode % 1 == 0:
print("\rEpisode {}/{}.".format(i_episode, num_episodes), end="")
# Generate an episode to be tuple (state, action, reward) tuples
episode = []
obs = env.reset()
for t in range(100):
# Sample an action from our policy
action, _states = trained_model.predict(obs)
next_state, reward, done, _ = env.step(action)
episode.append((state, action, reward))
if done:
break
obs = next_obs
# Sum of discounted returns
G = 0.0
# weights for return
W = 1.0
for t in range(len(episode))[::-1]:
obs, action, reward = episode[t]
G = discount * G + reward
# Add weights
C[obs][action] += W
# Update policy
Q[obs][action] += (W / C[obs][action]) * (G - Q[obs][action]
if action != np.argmax(target_policy(obs)):
break
W = W * 1./behavior_policy(obs)[action]
return Q, target_policy
When I call the function,
Q, policy = mc_control_importance_sampling(env, num_episodes=500000)
I get the error
AssertionError Traceback (most recent call last)
<ipython-input-58-eb968b9ff6e3> in <module>()
----> 1 Q, policy = mc_control_importance_sampling(env, num_episodes=500000)
1 frames
/content/gym_fishing/gym_fishing/envs/fishing_env.py in step(self, action)
76 def step(self, action):
77
---> 78 assert self.action_space.contains(action), "%r (%s) invalid"%(action, type(action))
79
80 if self.n_actions > 3:
AssertionError: defaultdict(<function mc_control_importance_sampling.<locals>.<lambda> at 0x7f31699ffe18>, {}) (<class 'collections.defaultdict'>) invalid
How do I fix this, your help would be appreciated.
I am not sure how to fix this,
thanks
The text was updated successfully, but these errors were encountered:
I have been working on a DQN using stable baselines and a discrete environment with 3 actions.
I am using the RL tutorial https://github.com/dennybritz/reinforcement-learning/blob/master/MC/MC%20Control%20with%20Epsilon-Greedy%20Policies%20Solution.ipynb
for reference
But i am having some issues with my helper functions for the Monte Carlo methods
When I call the function,
I get the error
I am not sure how to fix this,
thanks
The text was updated successfully, but these errors were encountered: