Skip to content

Commit

Permalink
Merge pull request dennybritz#134 from keithmgould/master
Browse files Browse the repository at this point in the history
update value estimator only after calculating advantage
  • Loading branch information
dennybritz authored Jan 29, 2018
2 parents 2a6fe49 + 30326df commit 5334a6f
Showing 1 changed file with 2 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -196,11 +196,11 @@
" for t, transition in enumerate(episode):\n",
" # The return after this timestep\n",
" total_return = sum(discount_factor**i * t.reward for i, t in enumerate(episode[t:]))\n",
" # Update our value estimator\n",
" estimator_value.update(transition.state, total_return)\n",
" # Calculate baseline/advantage\n",
" baseline_value = estimator_value.predict(transition.state) \n",
" advantage = total_return - baseline_value\n",
" # Update our value estimator\n",
" estimator_value.update(transition.state, total_return)\n",
" # Update our policy estimator\n",
" estimator_policy.update(transition.state, advantage, transition.action)\n",
" \n",
Expand Down

0 comments on commit 5334a6f

Please sign in to comment.