Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A problem about Q updates #20

Open
painterner opened this issue Oct 26, 2018 · 0 comments
Open

A problem about Q updates #20

painterner opened this issue Oct 26, 2018 · 0 comments

Comments

@painterner
Copy link

Hello! I can't understand this (389 - 407 line in run_summarization.py), why the "dqn_best_action" use
state other than state_prime ? I think dist_q_val = -tf.log(dist) * q_value (model.py) which means we should let dist and q_value be close each other , right ? Shouldn't we use ||Q-q||^2 (https://arxiv.org/pdf/1805.09461.pdf Eq. 29)

     # 389 line
     q_estimates = dqn_results['estimates'] # shape (len(transitions), vocab_size)
      dqn_best_action = dqn_results['best_action']
      #dqn_q_estimate_loss = dqn_results['loss']

      # use target DQN to estimate values for the next decoder state
      dqn_target_results = self.dqn_target.run_test_steps(self.dqn_sess, x= b_prime._x)
      q_vals_new_t = dqn_target_results['estimates'] # shape (len(transitions), vocab_size)
      
      # 407 line
      q_estimates[i][tr.action] = tr.reward + FLAGS.gamma * q_vals_new_t[i][dqn_best_action[i]]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant