Is a line missing in 'MC Control with Epsilon-Greedy Policies Solution.ipynb'? #220

Ritz111 · 2020-01-23T04:49:53Z

In the function mc_control_epsilon_greedy:

        # Find all (state, action) pairs we've visited in this episode
        # We convert each state to a tuple so that we can use it as a dict key
        sa_in_episode = set([(tuple(x[0]), x[1]) for x in episode])
        for state, action in sa_in_episode:
            sa_pair = (state, action)
            # Find the first occurance of the (state, action) pair in the episode
            first_occurence_idx = next(i for i,x in enumerate(episode)
                                       if x[0] == state and x[1] == action)
            # Sum up all rewards since the first occurance
            G = sum([x[2]*(discount_factor**i) for i,x in enumerate(episode[first_occurence_idx:])])
            # Calculate average return for this state over all sampled episodes
            returns_sum[sa_pair] += G
            returns_count[sa_pair] += 1.0
            Q[state][action] = returns_sum[sa_pair] / returns_count[sa_pair]
        
        # The policy is improved implicitly by changing the Q dictionary
    
    return Q, policy

I think a line should be added upon the last line:

            Q[state][action] = returns_sum[sa_pair] / returns_count[sa_pair]
        
        # The policy is improved implicitly by changing the Q dictionary
        policy = make_epsilon_greedy_policy(Q, epsilon, env.action_space.n)

    return Q, policy

Otherwise the policy will not upgrade.

The text was updated successfully, but these errors were encountered:

makaveli10 · 2020-04-23T06:41:53Z

@Ritz111 No, it'll update. Actually, the policy is updating as Q values are updating because it is fetching the next action according to the current Q values.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is a line missing in 'MC Control with Epsilon-Greedy Policies Solution.ipynb'? #220

Is a line missing in 'MC Control with Epsilon-Greedy Policies Solution.ipynb'? #220

Ritz111 commented Jan 23, 2020

makaveli10 commented Apr 23, 2020

Is a line missing in 'MC Control with Epsilon-Greedy Policies Solution.ipynb'? #220

Is a line missing in 'MC Control with Epsilon-Greedy Policies Solution.ipynb'? #220

Comments

Ritz111 commented Jan 23, 2020

makaveli10 commented Apr 23, 2020