Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hello #241

Open
wants to merge 91 commits into
base: revert-97-patch-1
Choose a base branch
from
Open

Hello #241

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
67e98a9
Merge pull request #98 from dennybritz/revert-97-patch-1
dennybritz Jul 8, 2017
71223d5
DQN copy_model_parameters memory leak fixed, tensorboard summaries up…
Kismuz Jul 8, 2017
e7085b2
Merge pull request #99 from Kismuz/DQN-copy-model-params-fix
dennybritz Jul 9, 2017
2b576bd
Update description of env.P[s][a]
sstarzycki Jul 21, 2017
762f34c
Merge pull request #102 from sstarzycki/patch-1
dennybritz Jul 21, 2017
1f04c1d
bind worker within lambda to avoid running worker twice
himanshusahni Oct 4, 2017
5364539
Merge pull request #111 from himanshusahni/master
dennybritz Oct 4, 2017
bc7ee05
worker name scope should have trailing backslash otherwise any worker…
himanshusahni Oct 10, 2017
18100ea
Merge pull request #113 from himanshusahni/master
dennybritz Oct 12, 2017
3611ec9
Fixed some of the issues with the DQN script as pointed out in #117
praveen-palanisamy Nov 1, 2017
e9068bf
Updated to support recent versions of TF. Removed deprecated function…
praveen-palanisamy Nov 1, 2017
0b2ae41
Fixed issues with the DQN in the exercise notebook
praveen-palanisamy Nov 1, 2017
10ce5dc
Fixed typo
praveen-palanisamy Nov 1, 2017
094ebf7
Merge pull request #118 from praveen-palanisamy/master
dennybritz Nov 3, 2017
60013e5
Sync function descriptions. Lambda -> gamma (discount factor). Added …
Nov 16, 2017
4307667
Updates function description in DP. Fixed typos in MC. Changed Lambda…
Nov 21, 2017
7017f9e
Fix links in all the `README.md`s
jonahweissman Nov 22, 2017
da612e5
Change kernel to python3
Nov 23, 2017
90b0a4a
Merge pull request #119 from BAILOOL/master
dennybritz Nov 23, 2017
7a31a2b
Merge pull request #120 from jonahweissman/master
dennybritz Nov 23, 2017
79cadc0
Lambda to Gamma. Updated Readme.
Nov 24, 2017
74d301c
Merge pull request #121 from BAILOOL/master
dennybritz Nov 25, 2017
3fce6b5
Updated Readme. Changed Lambda to Gamma
Dec 1, 2017
152dbc4
Updated link to Sutton's book
Dec 6, 2017
9ee6cdd
Updated link to Sutton's book
Dec 6, 2017
f45bcbf
Merge pull request #123 from BAILOOL/master
dennybritz Dec 6, 2017
dee1e01
DQN: Fixed typos. Changed labmda to gamma. Updated Readme
Dec 7, 2017
85565ec
"Policy Gradient Methods" is chapter 13 now
BAILOOL Dec 27, 2017
f637c42
"Policy Gradient Methods" chapter is completed. Updated OpenAI Gym li…
BAILOOL Dec 27, 2017
1f2e2eb
Fixed broken links to Solutions in PolicyGradient
BAILOOL Dec 27, 2017
783c2c3
Mod. estimator_value comment in actor-critic
Dec 28, 2017
61da56c
Merge pull request #126 from BAILOOL/master
dennybritz Dec 30, 2017
d8136b4
Updated links to new version of Sutton's book
Jan 3, 2018
fa95c7e
Merge pull request #129 from BAILOOL/master
dennybritz Jan 4, 2018
30326df
update value estimator only after calculating advantage
keithmgould Jan 24, 2018
9454010
Minor fix: sync sample policy with the solution
ByzanTine Jan 28, 2018
2a6fe49
Merge pull request #137 from ByzanTine/master
dennybritz Jan 29, 2018
5334a6f
Merge pull request #134 from keithmgould/master
dennybritz Jan 29, 2018
6211e2d
Add one step lookahead function for easy comparison with Value Iteration
Feb 19, 2018
e030ecf
Add value check assertion
Feb 19, 2018
edcba6b
Fix step and reset NotImplementedError
Feb 19, 2018
ba12f97
Update playground output
Feb 19, 2018
21a8e31
Merge pull request #145 from activatedgeek/fix-blackjack-env
dennybritz Feb 20, 2018
5e3b1fc
Merge pull request #144 from activatedgeek/refactor-lookahead
dennybritz Feb 20, 2018
8da669c
Fix missing render()
Feb 20, 2018
8933e3f
Merge pull request #146 from activatedgeek/fix-TD-envs
dennybritz Feb 21, 2018
542cbf0
Fix typo in MC Control
jonahweissman Mar 7, 2018
9baba87
Merge pull request #148 from jonahweissman/patch-1
dennybritz Mar 8, 2018
c90ebaf
correction for state processor output shape
ayberkydn Apr 13, 2018
e67b3cf
Merge pull request #155 from aybberk/master
dennybritz Apr 14, 2018
56f893c
typo fix and correction for state processor output shape
ayberkydn Apr 14, 2018
521b5a9
Merge pull request #157 from aybberk/master
dennybritz Apr 15, 2018
07dd722
added the equation reference
May 26, 2018
377c875
added Sutton book's equation
May 26, 2018
782b951
Merge pull request #162 from byorxyz/added_comment
dennybritz May 27, 2018
1b5c06f
Gambler's problem (ex.4.3) added.
May 28, 2018
167525b
Merge pull request #163 from byorxyz/ex.4.3
dennybritz May 28, 2018
be7cfe3
just formatting
May 28, 2018
4f0d942
updated the broken link
May 28, 2018
fe3edfc
fix #89
May 29, 2018
dfef331
Merge pull request #164 from byorxyz/ex.4.3
dennybritz May 29, 2018
49631ce
Update README.md
shar1pius Sep 20, 2018
fd13eca
Merge pull request #175 from Sharwon/patch-1
dennybritz Sep 21, 2018
b47c920
updates to README.md
jovsa Dec 24, 2018
8e8a21b
Merge pull request #187 from JovanSardinha/master
dennybritz Dec 25, 2018
57f71cd
imported io so that StringIO() would work
jovsa Dec 25, 2018
9ad2689
added documentation for _render()
jovsa Dec 25, 2018
0fe550c
documented structure for P[s][a]
jovsa Dec 25, 2018
30b2304
removed extra whitespace
jovsa Dec 25, 2018
01b8b13
nit
jovsa Dec 25, 2018
cee9e78
Merge pull request #188 from JovanSardinha/master
dennybritz Dec 26, 2018
120fbcf
Add link to Advanced Depp Learning & Reinforcement Learning lectures …
Feb 27, 2019
4a2df43
fixed shape descriptions for neural network input layer
alek5k Mar 1, 2019
e4797f2
Merge pull request #194 from alek5k/dqn-fixes
dennybritz Mar 2, 2019
706b13a
Merge pull request #193 from fspirit/add_deepmind_course_link
dennybritz Mar 2, 2019
a35df15
Updated links to new version of Sutton's book
PieroMacaluso Mar 13, 2019
b7b4d3d
Merge pull request #195 from pieromacaluso/master
dennybritz Mar 13, 2019
bb9241d
Fix rendering crash on Win 10
fspirit Mar 29, 2019
1abaae4
Q-Learning docstring improvements.
anuzis Apr 2, 2019
59cded5
Merge pull request #199 from anuzis/master
dennybritz Apr 3, 2019
b2d179a
Update CliffWalk REINFORCE with Baseline Solution.ipynb
guotong1988 Jun 11, 2019
9dce017
Merge pull request #201 from guotong1988/patch-1
dennybritz Jun 24, 2019
92205f5
Merge pull request #198 from fspirit/fix-rendering-crash-on-win-10
dennybritz Jun 24, 2019
775fd81
Update Policy Iteration Solution.ipynb
quanturk Oct 1, 2019
7d23260
Update Policy Iteration Solution.ipynb
quanturk Oct 1, 2019
89dacf5
Merge pull request #214 from nsydn/master
dennybritz Oct 2, 2019
1298c8d
Update README.md
roshray Nov 8, 2019
06ce312
Merge pull request #216 from roshray/patch-1
dennybritz Nov 9, 2019
40eda6b
Compatible with gym==0.26
arielsboiardi Sep 19, 2022
d173521
Corrected import
arielsboiardi Sep 20, 2022
2b83228
Merge pull request #245 from arielsboiardi/master
dennybritz Sep 20, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
293 changes: 293 additions & 0 deletions DP/Gamblers Problem Solution.ipynb

Large diffs are not rendered by default.

158 changes: 158 additions & 0 deletions DP/Gamblers Problem.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"### This is Example 4.3. Gambler’s Problem from Sutton's book.\n",
"\n",
"A gambler has the opportunity to make bets on the outcomes of a sequence of coin flips. \n",
"If the coin comes up heads, he wins as many dollars as he has staked on that flip; \n",
"if it is tails, he loses his stake. The game ends when the gambler wins by reaching his goal of $100, \n",
"or loses by running out of money. \n",
"\n",
"On each flip, the gambler must decide what portion of his capital to stake, in integer numbers of dollars. \n",
"This problem can be formulated as an undiscounted, episodic, finite MDP. \n",
"\n",
"The state is the gambler’s capital, s ∈ {1, 2, . . . , 99}.\n",
"The actions are stakes, a ∈ {0, 1, . . . , min(s, 100 − s)}. \n",
"The reward is zero on all transitions except those on which the gambler reaches his goal, when it is +1.\n",
"\n",
"The state-value function then gives the probability of winning from each state. A policy is a mapping from levels of capital to stakes. The optimal policy maximizes the probability of reaching the goal. Let p_h denote the probability of the coin coming up heads. If p_h is known, then the entire problem is known and it can be solved, for instance, by value iteration.\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as np\n",
"import sys\n",
"import matplotlib.pyplot as plt\n",
"if \"../\" not in sys.path:\n",
" sys.path.append(\"../\") "
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"\n",
"### Exercise 4.9 (programming)\n",
"\n",
"Implement value iteration for the gambler’s problem and solve it for p_h = 0.25 and p_h = 0.55.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def value_iteration_for_gamblers(p_h, theta=0.0001, discount_factor=1.0):\n",
" \"\"\"\n",
" Args:\n",
" p_h: Probability of the coin coming up heads\n",
" \"\"\"\n",
" \n",
" def one_step_lookahead(s, V, rewards):\n",
" \"\"\"\n",
" Helper function to calculate the value for all action in a given state.\n",
" \n",
" Args:\n",
" s: The gambler’s capital. Integer.\n",
" V: The vector that contains values at each state. \n",
" rewards: The reward vector.\n",
" \n",
" Returns:\n",
" A vector containing the expected value of each action. \n",
" Its length equals to the number of actions.\n",
" \"\"\"\n",
" \n",
" # Implement!\n",
" \n",
" return A\n",
" \n",
" # Implement!\n",
" \n",
" return policy, V"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"policy, v = value_iteration_for_gamblers(0.25)\n",
"\n",
"print(\"Optimized Policy:\")\n",
"print(policy)\n",
"print(\"\")\n",
"\n",
"print(\"Optimized Value Function:\")\n",
"print(v)\n",
"print(\"\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Plotting Final Policy (action stake) vs State (Capital)\n",
"\n",
"# Implement!"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Plotting Capital vs Final Policy\n",
"\n",
"# Implement!\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
56 changes: 23 additions & 33 deletions DP/Policy Evaluation Solution.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,11 @@
"cells": [
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"collapsed": false
},
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from IPython.core.debugger import set_trace\n",
"import numpy as np\n",
"import pprint\n",
"import sys\n",
Expand All @@ -18,10 +17,8 @@
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"collapsed": true
},
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"pp = pprint.PrettyPrinter(indent=2)\n",
Expand All @@ -30,10 +27,8 @@
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"collapsed": true
},
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"def policy_eval(policy, env, discount_factor=1.0, theta=0.00001):\n",
Expand All @@ -43,9 +38,11 @@
" Args:\n",
" policy: [S, A] shaped matrix representing the policy.\n",
" env: OpenAI env. env.P represents the transition probabilities of the environment.\n",
" env.P[s][a] is a (prob, next_state, reward, done) tuple.\n",
" env.P[s][a] is a list of transition tuples (prob, next_state, reward, done).\n",
" env.nS is a number of states in the environment. \n",
" env.nA is a number of actions in the environment.\n",
" theta: We stop evaluation once our value function change is less than theta for all states.\n",
" discount_factor: lambda discount factor.\n",
" discount_factor: Gamma discount factor.\n",
" \n",
" Returns:\n",
" Vector of length env.nS representing the value function.\n",
Expand All @@ -61,7 +58,7 @@
" for a, action_prob in enumerate(policy[s]):\n",
" # For each action, look at the possible next states...\n",
" for prob, next_state, reward, done in env.P[s][a]:\n",
" # Calculate the expected value\n",
" # Calculate the expected value. Ref: Sutton book eq. 4.6.\n",
" v += action_prob * prob * (reward + discount_factor * V[next_state])\n",
" # How much our value function changed (across any states)\n",
" delta = max(delta, np.abs(v - V[s]))\n",
Expand All @@ -74,10 +71,8 @@
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"collapsed": false
},
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"random_policy = np.ones([env.nS, env.nA]) / env.nA\n",
Expand All @@ -86,10 +81,8 @@
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"collapsed": false
},
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
Expand All @@ -98,7 +91,8 @@
"Value Function:\n",
"[ 0. -13.99993529 -19.99990698 -21.99989761 -13.99993529\n",
" -17.9999206 -19.99991379 -19.99991477 -19.99990698 -19.99991379\n",
" -17.99992725 -13.99994569 -21.99989761 -19.99991477 -13.99994569 0. ]\n",
" -17.99992725 -13.99994569 -21.99989761 -19.99991477 -13.99994569\n",
" 0. ]\n",
"\n",
"Reshaped Grid Value Function:\n",
"[[ 0. -13.99993529 -19.99990698 -21.99989761]\n",
Expand All @@ -121,10 +115,8 @@
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"collapsed": false
},
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# Test: Make sure the evaluated policy is what we expected\n",
Expand All @@ -135,9 +127,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"metadata": {},
"outputs": [],
"source": []
}
Expand All @@ -158,9 +148,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 1
}
20 changes: 10 additions & 10 deletions DP/Policy Evaluation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
"collapsed": true
},
"outputs": [],
"source": [
Expand All @@ -30,7 +30,7 @@
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
"collapsed": true
},
"outputs": [],
"source": [
Expand All @@ -41,9 +41,11 @@
" Args:\n",
" policy: [S, A] shaped matrix representing the policy.\n",
" env: OpenAI env. env.P represents the transition probabilities of the environment.\n",
" env.P[s][a] is a (prob, next_state, reward, done) tuple.\n",
" env.P[s][a] is a list of transition tuples (prob, next_state, reward, done).\n",
" env.nS is a number of states in the environment. \n",
" env.nA is a number of actions in the environment.\n",
" theta: We stop evaluation once our value function change is less than theta for all states.\n",
" discount_factor: gamma discount factor.\n",
" discount_factor: Gamma discount factor.\n",
" \n",
" Returns:\n",
" Vector of length env.nS representing the value function.\n",
Expand All @@ -60,7 +62,7 @@
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
"collapsed": true
},
"outputs": [],
"source": [
Expand All @@ -71,9 +73,7 @@
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"ename": "AssertionError",
Expand Down Expand Up @@ -121,9 +121,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 1
}
Loading