Skip to content

Commit

Permalink
Add MsPacman agents (#274)
Browse files Browse the repository at this point in the history
* Add MsPacman agents

* Allow to force custom objects

* Update changelog

* Fix loading issues
  • Loading branch information
araffin committed Aug 6, 2022
1 parent c2f00ea commit 89d4e0c
Show file tree
Hide file tree
Showing 11 changed files with 745 additions and 9 deletions.
5 changes: 4 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
## Release 1.5.1a8 (WIP)
## Release 1.6.0 (2022-08-05)

### Breaking Changes
- Change default value for number of hyperparameter optimization trials from 10 to 500. (@ernestum)
- Derive number of intermediate pruning evaluations from number of time steps (1 evaluation per 100k time steps.) (@ernestum)
- Updated default --eval-freq from 10k to 25k steps
- Update default horizon to 2 for the `HistoryWrapper`
- Upgrade to Stable-Baselines3 (SB3) >= 1.6.0
- Upgrade to sb3-contrib >= 1.6.0

### New Features
- Support setting PyTorch's device with thye `--device` flag (@gregwar)
Expand All @@ -14,6 +16,7 @@
- Added `RecurrentPPO` support (aka `ppo_lstm`)
- Added autodownload for "official" sb3 models from the hub
- Added Humanoid-v3, Ant-v3, Walker2d-v3 models for A2C (@pseudo-rnd-thoughts)
- Added MsPacman models

### Bug fixes
- Fix `Reacher-v3` name in PPO hyperparameter file
Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -331,7 +331,7 @@ The previous command will create a `mp4` file. To convert this file to `gif` for
python -m utils.record_training --algo ppo --env CartPole-v1 -n 1000 -f logs --deterministic --gif
```

## Current Collection: 150+ Trained Agents!
## Current Collection: 195+ Trained Agents!

Final performance of the trained agents can be found in [`benchmark.md`](./benchmark.md). To compute them, simply run `python -m utils.benchmark`.

Expand All @@ -354,10 +354,10 @@ Additional Atari Games (to be completed):

| RL Algo | MsPacman | Asteroids | RoadRunner |
|----------|-------------|-----------|------------|
| A2C | | :heavy_check_mark: | :heavy_check_mark: |
| PPO | | :heavy_check_mark: | :heavy_check_mark: |
| DQN | | :heavy_check_mark: | :heavy_check_mark: |
| QR-DQN | | :heavy_check_mark: | :heavy_check_mark: |
| A2C | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| PPO | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| DQN | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| QR-DQN | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |


### Classic Control Environments
Expand Down
4 changes: 4 additions & 0 deletions benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ and also allow users to have access to pretrained agents.*
|a2c |LunarLanderContinuous-v2 | 84.225| 145.906|5M | 149305| 256|
|a2c |MountainCar-v0 | -111.263| 24.087|1M | 149982| 1348|
|a2c |MountainCarContinuous-v0 | 91.166| 0.255|100k | 149923| 1659|
|a2c |MsPacmanNoFrameskip-v4 | 1671.730| 612.918|10M | 602450| 185|
|a2c |Pendulum-v1 | -162.965| 103.210|1M | 150000| 750|
|a2c |PongNoFrameskip-v4 | 17.292| 3.214|10M | 594910| 65|
|a2c |QbertNoFrameskip-v4 | 3882.345| 1223.327|10M | 610670| 194|
Expand Down Expand Up @@ -77,6 +78,7 @@ and also allow users to have access to pretrained agents.*
|dqn |EnduroNoFrameskip-v4 | 830.929| 194.544|10M | 599040| 14|
|dqn |LunarLander-v2 | 154.382| 79.241|100k | 149373| 200|
|dqn |MountainCar-v0 | -100.849| 9.925|120k | 149962| 1487|
|dqn |MsPacmanNoFrameskip-v4 | 2682.929| 492.567|10M | 599952| 140|
|dqn |PongNoFrameskip-v4 | 20.602| 0.613|10M | 598998| 88|
|dqn |QbertNoFrameskip-v4 | 9496.774| 5399.633|10M | 605844| 124|
|dqn |RoadRunnerNoFrameskip-v4 | 40396.350| 7069.131|10M | 603257| 137|
Expand All @@ -100,6 +102,7 @@ and also allow users to have access to pretrained agents.*
|ppo |LunarLanderContinuous-v2 | 270.863| 32.072|1M | 149956| 526|
|ppo |MountainCar-v0 | -110.423| 19.473|1M | 149954| 1358|
|ppo |MountainCarContinuous-v0 | 88.343| 2.572|20k | 149983| 633|
|ppo |MsPacmanNoFrameskip-v4 | 1754.356| 172.783|10M | 600822| 163|
|ppo |Pendulum-v1 | -172.225| 104.159|100k | 150000| 750|
|ppo |PongNoFrameskip-v4 | 20.989| 0.105|10M | 599902| 90|
|ppo |QbertNoFrameskip-v4 | 15627.108| 3313.538|10M | 600248| 83|
Expand All @@ -122,6 +125,7 @@ and also allow users to have access to pretrained agents.*
|qrdqn |EnduroNoFrameskip-v4 | 3231.200| 1311.801|10M | 585728| 5|
|qrdqn |LunarLander-v2 | 70.236| 225.491|100k | 149957| 522|
|qrdqn |MountainCar-v0 | -106.042| 15.536|120k | 149943| 1414|
|qrdqn |MsPacmanNoFrameskip-v4 | 997.867| 877.130|10M | 604914| 225|
|qrdqn |PongNoFrameskip-v4 | 20.492| 0.687|10M | 597443| 63|
|qrdqn |QbertNoFrameskip-v4 | 14799.728| 2917.629|10M | 600773| 92|
|qrdqn |RoadRunnerNoFrameskip-v4 | 42325.424| 8361.161|10M | 591016| 59|
Expand Down
6 changes: 5 additions & 1 deletion enjoy.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,10 @@ def main(): # noqa: C901
parser.add_argument(
"--env-kwargs", type=str, nargs="+", action=StoreDict, help="Optional keyword argument to pass to the env constructor"
)
parser.add_argument(
"--custom-objects", action="store_true", default=False, help="Use custom objects to solve loading issues"
)

args = parser.parse_args()

# Going through custom gym packages to let them register in the global registory
Expand Down Expand Up @@ -170,7 +174,7 @@ def main(): # noqa: C901
newer_python_version = sys.version_info.major == 3 and sys.version_info.minor >= 8

custom_objects = {}
if newer_python_version:
if newer_python_version or args.custom_objects:
custom_objects = {
"learning_rate": 0.0,
"lr_schedule": lambda _: 0.0,
Expand Down
187 changes: 187 additions & 0 deletions logs/benchmark/a2c-MsPacmanNoFrameskip-v4/0.monitor.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
#{"t_start": 1659728336.7725544, "env_id": "MsPacmanNoFrameskip-v4"}
r,l,t
1730.0,3234,3.953365
1560.0,3298,5.427331
1100.0,2714,6.639803
2360.0,3634,8.255448
1950.0,3874,9.985612
2450.0,4090,11.803499
1070.0,3074,13.174083
1710.0,2530,14.298565
4470.0,4194,16.174039
1790.0,2874,17.454673
2030.0,3162,18.860609
1600.0,2266,19.866352
2440.0,3610,21.472004
1150.0,2210,22.459134
1530.0,3554,24.041418
1270.0,2626,25.208103
1280.0,2482,26.314844
1680.0,3226,27.752215
1100.0,2514,28.87338
2360.0,5218,31.201759
1200.0,2970,32.53076
960.0,3418,34.053116
1730.0,2722,35.263485
1520.0,4170,37.123587
1660.0,2682,38.316556
1580.0,2946,39.626874
1960.0,3530,41.199417
1170.0,2530,42.328518
2130.0,3890,44.061271
1910.0,3970,45.832457
1810.0,3050,47.194604
2430.0,4418,49.168339
2320.0,2938,50.474335
2120.0,4906,52.668199
1360.0,2850,53.939555
1020.0,2346,54.978707
2280.0,3698,56.63467
1560.0,3866,58.354663
1200.0,3082,59.732336
1180.0,3834,61.443044
2100.0,3586,63.045426
1010.0,2754,64.285646
1240.0,2674,65.485508
2990.0,3370,66.994512
1290.0,2466,68.149448
1440.0,3130,69.61241
1560.0,2458,70.763192
1700.0,3106,72.160394
2130.0,3522,73.813967
4950.0,3890,75.72441
1950.0,2978,77.180707
2030.0,4090,79.102359
1550.0,3322,80.647296
1530.0,3626,82.287508
1090.0,3162,83.792252
2030.0,3442,85.417475
1400.0,2906,86.786747
1560.0,3106,88.266386
1210.0,2658,89.512441
2770.0,4034,91.376633
1200.0,3242,92.889384
3110.0,4722,95.071795
1060.0,2738,96.366151
1310.0,3338,97.908794
1810.0,3218,99.350722
2580.0,3018,100.764254
1780.0,3906,102.538961
1630.0,2954,103.861392
1340.0,2978,105.20175
1160.0,2330,106.260769
1650.0,3818,108.327378
1100.0,2818,109.85005
950.0,2346,111.110954
900.0,2234,112.313941
1670.0,3858,114.393967
2230.0,4506,116.827441
1340.0,2786,118.330728
3030.0,3738,120.362621
2090.0,3714,122.375252
1240.0,3218,124.109298
1220.0,3170,125.812876
1310.0,2634,127.262277
1570.0,3194,128.965139
1290.0,3058,130.362771
1820.0,2850,131.608693
1950.0,3842,133.281789
1430.0,2722,134.451039
2100.0,3586,136.02444
1750.0,3594,137.63478
1590.0,2706,138.844255
1500.0,3570,140.421348
1670.0,4362,142.362482
1490.0,3314,143.837664
730.0,2874,145.113736
2590.0,4802,147.192036
1230.0,2850,148.438278
1800.0,3650,150.012589
3370.0,3370,151.468725
1190.0,2610,152.591411
1810.0,4946,154.727196
2520.0,4290,156.59002
1380.0,2874,157.903495
2490.0,3810,159.663447
1670.0,2714,160.8516
1500.0,3954,162.555276
1570.0,3690,164.145591
1690.0,3738,165.75474
1550.0,4538,167.75505
1650.0,3562,169.36602
1260.0,2970,170.68833
1670.0,2874,171.928043
1940.0,3018,173.302168
1030.0,2682,174.551034
1890.0,3618,176.189759
1160.0,3034,177.564327
1680.0,3266,179.01814
1840.0,3506,180.649786
1070.0,2538,181.787977
2030.0,3938,183.56093
2960.0,4634,185.557277
1920.0,3410,187.089478
1620.0,3066,188.49645
1260.0,2466,189.63132
1030.0,2914,190.900608
1740.0,2834,192.218688
2340.0,3682,193.901766
1110.0,2762,195.125856
1190.0,2786,196.371832
1820.0,3306,197.829656
1930.0,3010,199.169468
1200.0,2434,200.343077
1380.0,3802,202.276492
1790.0,2810,203.649479
1980.0,4170,205.609311
1990.0,3226,207.144824
1700.0,4266,209.257385
1010.0,2658,210.532377
1240.0,2586,211.735485
1580.0,3186,213.318766
1090.0,2762,214.718967
1370.0,3194,216.254075
1770.0,4986,218.678091
1140.0,2306,219.77412
1470.0,3538,221.694823
870.0,2026,222.677573
1710.0,3362,224.210367
2410.0,5522,226.616171
1280.0,2698,227.791895
1150.0,2714,228.979634
1400.0,3610,230.616868
1630.0,3730,232.327563
1750.0,2914,233.723598
1150.0,2626,234.917321
1730.0,3642,236.556935
1200.0,3098,237.904774
1430.0,3842,239.581436
1350.0,2730,240.753289
1470.0,4874,242.852802
1760.0,3170,244.236137
1550.0,2890,245.477554
1580.0,3298,246.899028
1310.0,2866,248.130993
1590.0,3066,249.452029
480.0,1762,250.20387
2260.0,3842,251.851681
1930.0,4330,253.715691
1110.0,2866,254.948357
1120.0,2498,256.028346
1960.0,2610,257.199669
1030.0,2482,258.326828
2170.0,2178,259.308928
1890.0,3242,260.714789
1210.0,2706,261.879737
1670.0,3738,263.499191
1130.0,2138,264.428535
1840.0,3546,265.977581
1840.0,3010,267.331547
1400.0,3298,268.879427
2490.0,3042,270.281823
1180.0,2386,271.379688
4340.0,3738,273.109107
1220.0,3106,274.53357
1260.0,2714,275.779621
1310.0,2794,277.063536
1040.0,2674,278.295469
4 changes: 4 additions & 0 deletions logs/benchmark/benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ and also allow users to have access to pretrained agents.*
|a2c |LunarLanderContinuous-v2 | 84.225| 145.906|5M | 149305| 256|
|a2c |MountainCar-v0 | -111.263| 24.087|1M | 149982| 1348|
|a2c |MountainCarContinuous-v0 | 91.166| 0.255|100k | 149923| 1659|
|a2c |MsPacmanNoFrameskip-v4 | 1671.730| 612.918|10M | 602450| 185|
|a2c |Pendulum-v1 | -162.965| 103.210|1M | 150000| 750|
|a2c |PongNoFrameskip-v4 | 17.292| 3.214|10M | 594910| 65|
|a2c |QbertNoFrameskip-v4 | 3882.345| 1223.327|10M | 610670| 194|
Expand Down Expand Up @@ -77,6 +78,7 @@ and also allow users to have access to pretrained agents.*
|dqn |EnduroNoFrameskip-v4 | 830.929| 194.544|10M | 599040| 14|
|dqn |LunarLander-v2 | 154.382| 79.241|100k | 149373| 200|
|dqn |MountainCar-v0 | -100.849| 9.925|120k | 149962| 1487|
|dqn |MsPacmanNoFrameskip-v4 | 2682.929| 492.567|10M | 599952| 140|
|dqn |PongNoFrameskip-v4 | 20.602| 0.613|10M | 598998| 88|
|dqn |QbertNoFrameskip-v4 | 9496.774| 5399.633|10M | 605844| 124|
|dqn |RoadRunnerNoFrameskip-v4 | 40396.350| 7069.131|10M | 603257| 137|
Expand All @@ -100,6 +102,7 @@ and also allow users to have access to pretrained agents.*
|ppo |LunarLanderContinuous-v2 | 270.863| 32.072|1M | 149956| 526|
|ppo |MountainCar-v0 | -110.423| 19.473|1M | 149954| 1358|
|ppo |MountainCarContinuous-v0 | 88.343| 2.572|20k | 149983| 633|
|ppo |MsPacmanNoFrameskip-v4 | 1754.356| 172.783|10M | 600822| 163|
|ppo |Pendulum-v1 | -172.225| 104.159|100k | 150000| 750|
|ppo |PongNoFrameskip-v4 | 20.989| 0.105|10M | 599902| 90|
|ppo |QbertNoFrameskip-v4 | 15627.108| 3313.538|10M | 600248| 83|
Expand All @@ -122,6 +125,7 @@ and also allow users to have access to pretrained agents.*
|qrdqn |EnduroNoFrameskip-v4 | 3231.200| 1311.801|10M | 585728| 5|
|qrdqn |LunarLander-v2 | 70.236| 225.491|100k | 149957| 522|
|qrdqn |MountainCar-v0 | -106.042| 15.536|120k | 149943| 1414|
|qrdqn |MsPacmanNoFrameskip-v4 | 997.867| 877.130|10M | 604914| 225|
|qrdqn |PongNoFrameskip-v4 | 20.492| 0.687|10M | 597443| 63|
|qrdqn |QbertNoFrameskip-v4 | 14799.728| 2917.629|10M | 600773| 92|
|qrdqn |RoadRunnerNoFrameskip-v4 | 42325.424| 8361.161|10M | 591016| 59|
Expand Down
Loading

0 comments on commit 89d4e0c

Please sign in to comment.