Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Render history #149

Open
wants to merge 105 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
105 commits
Select commit Hold shift + click to select a range
6534061
Merge pull request #9 from littleV/gomoku
werner-duvaud Feb 22, 2020
cd3db2f
Add hyperparameters to TensorBoard and update menu
werner-duvaud Feb 23, 2020
a06b9fb
Update readme
werner-duvaud Feb 23, 2020
a850762
Added Tic-Tac-Toe game.
fidel-schaposnik Feb 27, 2020
697435e
Merge pull request #14 from fidel-schaposnik/tictactoe
werner-duvaud Feb 29, 2020
1454f56
Improve ResNet pooling and add abstract game
werner-duvaud Feb 29, 2020
9f3f25f
Add test again random agent
werner-duvaud Feb 29, 2020
7d3acbc
Add parameters for board games
werner-duvaud Mar 1, 2020
18be118
Add random agent results to tensorboard
werner-duvaud Mar 2, 2020
b16e4ae
Convergence of tic-tac-toe with fully connected network
werner-duvaud Mar 5, 2020
4aad737
Type error
manuel-delverme Mar 5, 2020
0d110d0
Merge pull request #17 from manuel-delverme/patch-1
werner-duvaud Mar 5, 2020
f2f0f6f
Add stack action to stacked observations
werner-duvaud Mar 8, 2020
1442e20
Fix MCTS
werner-duvaud Mar 11, 2020
1864750
Fix MCTS and typo
werner-duvaud Mar 14, 2020
c16e2ec
Fix OverflowError and Add Conv 1x1
ahainaut Mar 16, 2020
7b861c4
Improve lunarlander hyperparameters and scale encoded state gradients
werner-duvaud Mar 19, 2020
10b7fda
add PER support
xuxiyang1993 Mar 19, 2020
58af304
add PER support
xuxiyang1993 Mar 20, 2020
f51d2cb
Improve cartpole hyperparameters and fix typo
werner-duvaud Mar 20, 2020
976aa01
Merge pull request #23 from xuxiyang1993/master
werner-duvaud Mar 20, 2020
5c2e4d4
update PER support
xuxiyang1993 Mar 21, 2020
bdbf703
Merge pull request #25 from xuxiyang1993/master
werner-duvaud Mar 23, 2020
027ecc5
Improve loss scaling and fix MCTS
werner-duvaud Mar 26, 2020
43688f7
IS weights for prioritized replay
xuxiyang1993 Mar 27, 2020
4a20f90
Merge pull request #29 from xuxiyang1993/master
werner-duvaud Mar 28, 2020
b62bcce
Change batch aggregation, fix value in replay buffer and prepare merge
werner-duvaud Mar 28, 2020
2d02fc3
Merge branch 'master' into prioritized_replay
werner-duvaud Mar 28, 2020
5c61323
Merge pull request #30 from werner-duvaud/prioritized_replay
werner-duvaud Mar 28, 2020
ee1b333
Add notebook and fix merge
werner-duvaud Mar 28, 2020
5cd199d
Fix typo
werner-duvaud Mar 28, 2020
f79bc3c
Add selfplay / train ratio and improve reproductibility
werner-duvaud Mar 30, 2020
0fd671f
Add tree depth info
werner-duvaud Mar 31, 2020
927064a
Add Atari
werner-duvaud Apr 1, 2020
29824b6
td error for PER
werner-duvaud Apr 4, 2020
84e1447
Add value reanalyze
werner-duvaud Apr 4, 2020
cfa34f7
Fix bug in build of grpcio last version
werner-duvaud Apr 5, 2020
3b5bb4f
Improve memory with stacked observations
werner-duvaud Apr 5, 2020
ceb45c3
Turn replay buffer into numbered dict
werner-duvaud Apr 7, 2020
17882e4
Add mean value plot
werner-duvaud Apr 11, 2020
d7ba7d8
Refactor
ahainaut Apr 19, 2020
593c198
Fix #34 (last commit)
ahainaut Apr 21, 2020
2575a29
Fix numpy types #35
werner-duvaud Apr 22, 2020
b908eb8
Refactor
werner-duvaud Apr 24, 2020
b63f43b
Upon exit, replay buffer perists to disk, and can be optionally pre-p…
fred-drake Apr 25, 2020
8fc3f72
Update Breakout configuration
werner-duvaud Apr 26, 2020
c5d4b83
Merge pull request #38 from fdrake76/persistent_replay_buffer
werner-duvaud Apr 27, 2020
c2b0141
Refactor
werner-duvaud Apr 27, 2020
1943c08
Hyperparameters tuning for Gomoku and Connect4
ahainaut Apr 29, 2020
17f7b9b
Fix reward for more than 2 players
tfzee May 2, 2020
d3f679b
Update doc and refactor
werner-duvaud May 3, 2020
5bb15ec
Update initial priority
werner-duvaud May 3, 2020
a5457f8
Adding twentyone game as an example.
May 5, 2020
46c214b
Fixing minor issue with __init__.
May 5, 2020
c88737b
Typo
werner-duvaud May 6, 2020
e34de11
Merge pull request #43 from TimZF/patch-1
ahainaut May 6, 2020
3785486
Add conv1x1 in heads
ahainaut May 6, 2020
aa43378
Update README.md
werner-duvaud May 8, 2020
4573a95
Update README.md
werner-duvaud May 8, 2020
bd95d2b
Fixing learning rate and implemented resnet.
May 9, 2020
ed7ad49
Merge pull request #44 from AdrianAcala/twentyone
werner-duvaud May 10, 2020
a725915
Formatting
werner-duvaud May 10, 2020
74b2ad5
Fix #50
werner-duvaud May 12, 2020
9323f3b
Update README.md
werner-duvaud May 21, 2020
a70b36c
Fix #55
werner-duvaud Jun 1, 2020
6f27841
Update README.md
werner-duvaud Jun 1, 2020
0dfee62
Update README.md
ahainaut Jun 11, 2020
0727771
Add diagnose model
werner-duvaud Jun 22, 2020
fce9e71
Update parameters
ahainaut Jun 24, 2020
fee2f7e
Fix backpropagate
werner-duvaud Jun 30, 2020
fc29838
Add hp search and deterministic lunarlander, improve ratio metric and…
werner-duvaud Jul 26, 2020
c0f171b
Update muzero.py
werner-duvaud Jul 26, 2020
b615622
Add selfplay with gpu, multi gpu, better env closing, save hp search …
werner-duvaud Jul 30, 2020
afd0b6d
Add Reanalyse
werner-duvaud Aug 10, 2020
2cdd3e4
Add code structure img and uniform reanalyse
werner-duvaud Aug 12, 2020
4c4422f
Improve CPU/GPU management
werner-duvaud Aug 14, 2020
710a33f
Update replay_buffer.py
werner-duvaud Aug 16, 2020
e46b500
Add resume training and improve training exit
werner-duvaud Aug 20, 2020
4cdcae0
Fix string formatting (#77)
LukeWood Sep 2, 2020
ffbd4b3
Improve docstring and fix load replay buffer #75
werner-duvaud Sep 6, 2020
cd2c7a4
Fix tic tac toe action to string
werner-duvaud Sep 7, 2020
2600e3e
Fix reanalyse and format
werner-duvaud Sep 16, 2020
b7a7665
Add lunarlander checkpoint
werner-duvaud Sep 23, 2020
ab3ffe6
Renames window_size to replay_buffer_size (#83)
SanjoSolutions Sep 27, 2020
2bc3534
Update TicTacToe configuration
ahainaut Oct 29, 2020
4b500ff
Fix badge typo (#85)
sondrelg Nov 3, 2020
f752af6
Fix Pytorch 1.7
werner-duvaud Nov 6, 2020
ca40525
Parallelize get_batch in trainer (#86)
sergiovieri Nov 7, 2020
1583ad9
Fix GPU availability in actors with PyTorch 1.7
werner-duvaud Nov 11, 2020
f350a08
Fix #89 (residual block)
werner-duvaud Nov 20, 2020
8fba3da
Update README.md
werner-duvaud Dec 16, 2020
dccac8a
Fix reanalyse on GPU
werner-duvaud Dec 26, 2020
bdafa68
Typo fix #100
werner-duvaud Dec 29, 2020
469c327
Explicitly remove special lunarlander import fix #106
werner-duvaud Jan 7, 2021
a4bf9a7
Change spaces to underscores to fix SummaryWriter error messages (#104)
dribnet Jan 7, 2021
7883c3c
Added submenu for simpler loading of pre-trained models (#105)
dribnet Jan 7, 2021
fb9c6ca
Fix gomoku hyperparameters and format
werner-duvaud Jan 7, 2021
1e5c3d0
fix reward sign in compute_target_value()
mokemokechicken Jan 8, 2021
cf320f0
sample N games at one time in replay_buffer
mokemokechicken Jan 15, 2021
e864f59
Merge pull request #108 from mokemokechicken/fix_reward_sign_in_compu…
ahainaut Jan 27, 2021
14fe80d
Update comments
ahainaut Jan 27, 2021
a3e899f
Fix #114
ahainaut Feb 8, 2021
2f3e3cb
Merge pull request #117 from mokemokechicken/feature/sample_n_games_a…
ahainaut Feb 9, 2021
342e785
Merge branch 'master' of github.com:egafni/muzero-general
egafni Apr 15, 2021
3592fc9
render hist
egafni Apr 15, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions games/cartpole.py
Original file line number Diff line number Diff line change
Expand Up @@ -178,12 +178,17 @@ def close(self):
"""
self.env.close()

def render(self):
def render(self, mode="rgb_array"):
"""
Display the game observation.
"""
self.env.render()
input("Press enter to take a step ")
if mode == "default":
self.env.render()
input("Press enter to take a step ")
elif mode == "rgb_array":
return self.env.render(mode="rgb_array")
else:
raise ValueError(f'{mode} is not a valid mode')

def action_to_string(self, action_number):
"""
Expand Down
2 changes: 2 additions & 0 deletions muzero.py
Original file line number Diff line number Diff line change
Expand Up @@ -362,6 +362,8 @@ def test(
num_tests (int): Number of games to average. Defaults to 1.

num_gpus (int): Number of GPUs to use, 0 forces to use the CPU. Defaults to 0.

render_history (bool): whether to store a history of the rendered environment
"""
opponent = opponent if opponent else self.config.opponent
muzero_player = muzero_player if muzero_player else self.config.muzero_player
Expand Down
Loading