Add results for various strategies against Slumbot

Gongsta · Jun 22, 2024 · e8f6e51 · e8f6e51
1 parent 34ac0ff
commit e8f6e51
Show file tree

Hide file tree

Showing 12 changed files with 302 additions and 24 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,6 +1,8 @@
+.vscode
+old/
 *.out
 # Local Files
-animation/media 
+animation/media
 .DS_Store
 
 # Byte-compiled / optimized / DLL files

diff --git a/README.md b/README.md
@@ -51,16 +51,33 @@ Poker is an interesting game to work on because it is an imperfect information g
 - [ ] Implement depth-limited solving to improve the strategy in real-time
 - [ ] Implement Computer Vision + Deep Learning to recognize Poker cards, and so you can deploy this model in real life by mounting a camera to your head
 
+## Important Files
+- `poker_main.py` contains code for the GUI interface to the Poker game
+- `environment.py` contains the game logic
+- `aiplayer.py` contains logic to interface with AI
+- `abstraction.py` contains logic for clustering cards based on equity
+- `postflop_holdem.py` contains the logic for training Poker AI for **postflop**
+- `preflop_hodlem.py` contains logic for trainining Poker AI for **preflop**
 
 ### Timeline
 06-06-2022: Created a basic Poker Environment in PyGame to play in. Wrote classes for `Card`, `Deck`, `Player`, `PokerEnvironment`. Used bitmasks to quickly evaluate the strength of a hand.
+
 07-01-2022: Started learning about writing the AI. Explored different reinforcement learning algorithms, look into what papers have done. Realized that RL algorithms don't work at all in imperfect information games. It fails at the simplest game of Rock-Paper-Scissors because the policy it comes up with is deterministic, and easily exploitable. What we need to do is take a game theory approach, using the idea of Counterfactual Regret Minimization (CFR) to create a strategy that converges to the Nash Equilibrium.
+
 07-05-2022: Implementd regret-matching for Rock-Paper-Scissors
+
 07-15-2022: Wrote the vanilla CFR code for Kuhn Poker
+
 09-07-2022: Implemented abstractions to reduce the size of the Poker game to solve for. Implemented a simple clustering algorithm that uses these EHS to cluster various cards / scenarios together. Implemented basic monte-carlo method to calculate the EHS of a pair of cards at different stages of the game. This assumes a random uniform draw of opponent hands and random uniform rollout of community cards.
+
 09-20-2022: Used this project as a personal poker trainer (displaying the pot odds). Can help you refine your game, see the `learn_pot_odds.py` file.
-09-30: Write CFR code as a library, since there is no universal support of CFR. I wish the researchers released those, but everyone seems to just do their own thing. It kind of seems like the early days of neural networks, when everyone would write their own backward pass for backpropagation, until Tensorflow and Pytorch came along.
-06-15-2024: Started revisiting the project. Used simple equity to cluster, since computing equity distribution is too slow, my compute not beefy enough.
+
+09-30-2022: Write CFR code as a library, since there is no universal support of CFR. I wish the researchers released those, but everyone seems to just do their own thing. It kind of seems like the early days of neural networks, when everyone would write their own backward pass for backpropagation, until Tensorflow and Pytorch came along.
+
+06-15-2024: Started revisiting the project. Tried to train on the full poker game tree, but noticed that there were too many states to train on.
+
+06-18-2024: Used simple equity to cluster.
+
 06-17-2024: Split into preflop training and post-flop training. Started training over 1,000,000 different hands, with dataset generated in `src/dataset`.
 
 ### Dataset
@@ -70,8 +87,14 @@ I generated 1,000,000 hands of poker data offline, which I used to train the AI.
 Poker has very high variance, which also makes it hard to benchmark. I've benchmarked against the [Slumbot](https://www.slumbot.com/), which was one of the best poker bots in the world in 2017. Measured across ~10,000 hands. API code in [slumbot_api.py](slumbot/slumbot_api.py). Visualizations generated from the [visualize.ipynb](slumbot/visualize.ipynb) notebook.
 
 First 3 strategies implement logic purely on heuristics.
+### Strategy 0: All-in (-295.895 BB/100)
 
-#### Strategy 0: Always checking or calling the opponent's bet (-142.325 BB/100)
+```python
+incr = "b20000"
+```
+
+![Strategy 0](results/strategy0.png)
+### Strategy 1: Always checking or calling the opponent's bet (-123.335 BB/100)
 This is the most naive implementation, where we always check or call the opponent's bet. We never fold.
 
 
@@ -82,11 +105,11 @@ else:  # opponent has bet, so simply call
     incr = "c"
 ```
 
-![Strategy 0](results/strategy0.png)
 
+![Strategy 1](results/strategy1.png)
 
 
-#### Strategy 1: Naive bet by equity (-77.36 BB/100)
+### Strategy 2: Naive bet by equity (-112.045 BB/100)
 ```python
 equity = calculate_equity(hole_cards, board, n=5000)
 print(f"equity calculated: {equity} for hole cards: {hole_cards} and board: {board}")
@@ -102,24 +125,25 @@ else:
         incr = "f"
 ```
 
-![Strategy 1](results/strategy1.png)
+![Strategy 2](results/strategy2.png)
 
-#### Strategy 2: More advanced equity
+### Strategy 3: More advanced equity (-204.2917 BB/100)
 A more advanced heuristics that makes bets based on the current equity (see `slumbot/slumbot_api.py`).
 
 I actually played this "AI" against my dad and it had beaten him :P
 
-#### Strategy 3: use CFR
-CFR on very abstracted version of the game. Preflop and flop solved independently.
+![Strategy 3](results/strategy3.png)
 
+### Strategy 4: CFR (WORK-IN-PROGRESS)
+CFR on very abstracted version of the game. Preflop and flop solved independently through `preflop_holdem.py` and `postflop_holdem.py`. Abstractions
+computed in `src/abstraction.py`.
+
+Still need to implement kmeans clustering for post-flop, turn, and river.
+
+
+![Strategy 4](results/strategy4.png)
 
 
-## Important Files
-- `poker_main.py` contains code for the GUI interface to the Poker game
-- `environment.py` contains the game logic
-- `aiplayer.py` contains logic to interface with AI
-- `postflop_holdem.py` contains the logic for training Poker AI for **postflop**
-- `preflop_hodlem.py` contains logic for trainining Poker AI for **preflop**
 
 
 ## High-Level overview of AI

diff --git a/notebooks/abstraction_exploration.ipynb b/notebooks/abstraction_exploration.ipynb
@@ -389,6 +389,18 @@
     "visualizer.fit(turn_equity_distributions)        # Fit the data to the visualizer\n",
     "visualizer.show()        # Finalize and render the figure"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "I will proceed with 50 clusters for flop and turn, and 10 for river (river doesn't need equity distribution). It seems to be a good balance between speed and performance."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
   }
  ],
  "metadata": {
@@ -407,7 +419,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.8"
+   "version": "3.12.3"
   },
   "orig_nbformat": 4,
   "vscode": {

diff --git a/resources.md b/resources.md
@@ -0,0 +1,89 @@
+# Resources
+A non-exhaustive list of repositories, articles and papers I have consulted to put together this project together. To be continually updated.
+
+Git Repositories
+- https://github.com/ai-decision/decisionholdem
+	-  Recently open-sourced solution, though it seems that it doesn't include the code for abstractions, nor depth-limited solving
+- https://github.com/matthewkennedy5/Poker -> Really good writing
+- https://github.com/fedden/poker_ai
+- https://github.com/jneckar/skybet
+- https://github.com/zanussbaum/pluribus (An attempt at implementing Pluribus)
+- https://github.com/doas3140/PyStack (Python Implementation of DeepStack)
+- These students tried to make a copy of Libratus: https://github.com/michalp21/coms4995-finalproj
+	https://github.com/tansey/pycfr (8 years old) -> implementation in CFR, not support nolimit texas holdem
+- Pokerbot https://github.com/dickreuter/Poker
+- Gym Environment https://github.com/dickreuter/neuron_poker
+
+Blogs
+- https://int8.io/counterfactual-regret-minimization-for-poker-ai/
+- https://aipokertutorial.com/
+
+Other:
+- Really good [tutorial](https://aipokertutorial.com/) by a guy who played 10+ years online poker
+- Poker Mathematics [Book](http://www.pokerbooks.lt/books/en/The_Mathematics_of_Poker.pdf)
+
+Paper links
+- [An Introduction to CFR](http://modelai.gettysburg.edu/2013/cfr/cfr.pdf) (Neller, 2013) ESSENTIAL
+- **Vanilla CFR**
+	- (CFR first introduced) [Regret Minimization in Games with Incomplete Information](https://poker.cs.ualberta.ca/publications/NIPS07-cfr.pdf) (Bowling, 2007)
+	- [Using CFR to Create Competitive Multiplayer Poker Agents](https://poker.cs.ualberta.ca/publications/AAMAS10.pdf) (Risk, 2010)
+- [Efficient MCCFR in Games with Many Player Actions](https://proceedings.neurips.cc/paper/2012/file/3df1d4b96d8976ff5986393e8767f5b2-Paper.pdf) (Burch, 2012)
+- **CFR-BR** (CFR-Best Response)
+	- [Finding Optimal Abstract Strategies in Extensive-Form Games](https://poker.cs.ualberta.ca/publications/AAAI12-cfrbr.pdf) (Burch, 2012) (IMPORTANT paper in finding)
+- **Monte-Carlo CFR** (IMPORTANT)
+- **CFR-D (Decomposition)**
+	- [Solving Imperfect Information Games Using Decomposition](https://poker.cs.ualberta.ca/publications/aaai2014-cfrd.pdf) (Burch, 2013) 
+- **CFR+**
+	- (Pseudocode) [Solving Large Imperfect Information Games Using CFR+](https://arxiv.org/pdf/1407.5042.pdf) (Tammelin, 2014) 
+	- [Solving Heads-up Limit Texas Hold’em](https://poker.cs.ualberta.ca/publications/2015-ijcai-cfrplus.pdf) (Tammelin, 2015)
+- **RBP** Regret-Based Pruning
+	- RBP is particularly useful in large games where many actions are suboptimal, but where it is not known beforehand which actions those are
+	- [Regret-Based Pruning in Extensive-Form Games](https://www.cs.cmu.edu/~noamb/papers/15-NIPS-Regret-Based.pdf) (Brown, 2015)
+- Warm Start CFR
+	- [Strategy-Based Warm Starting for Regret Minimization in Games](https://www.cs.cmu.edu/~noamb/papers/16-AAAI-Strategy-Based.pdf) (Brown, 2015)
+- **DCFR** (Discounted CFR)
+	- [Solving Imperfect-Information Games via Discounted Regret Minimization](https://arxiv.org/abs/1809.04040) (Brown, 2018)
+- **ICFR** (instant CFR)
+	- [Efficient CFR for Imperfect Information Games with Instant Updates](https://realworld-sdm.github.io/paper/27.pdf) (Li, 2019)
+- **Deep CFR**
+	- [Deep Counterfactual Regret Minimization](https://arxiv.org/abs/1811.00164) (Brown, 2018)
+	- [Combining Deep Reinforcement Learning and Search for Imperfect-Information Games](https://arxiv.org/abs/2007.13544) (Brown, 2020)
+
+
+Other ideas
+- **Depth-Limited Solving** (IMPORTANT): This is a key technique that allows us to train a top tier Poker AI on our local computer, by improving a blueprint strategy.
+	- [Depth-Limited Solving for Imperfect-Information Games](https://arxiv.org/pdf/1805.08195.pdf) (Brown, 2018)
+- **Abstractions** (IMPORTANT):  See [[Game Abstraction]]. Abstractions are absolutely necessary, since Texas Hold'Em is too big to solve directly 
+	- [A heads-up no-limit Texas Hold’em poker player: Discretized betting models and automatically generated equilibrium-finding programs](https://www.cs.cmu.edu/~sandholm/tartanian.AAMAS08.pdf)
+	- [Action Translation in Extensive-Form Games with Large Action Spaces: Axioms, Paradoxes, and the Pseudo-Harmonic Mapping](https://www.cs.cmu.edu/~sandholm/reverse%20mapping.ijcai13.pdf) (Sandholm, 2013)
+	- [Evaluating State-Space Abstractions in Extensive-Form Games](https://poker.cs.ualberta.ca/publications/AAMAS13-abstraction.pdf) (Burch, 2013)
+	- [Potential-Aware Imperfect-Recall Abstraction with Earth Mover’s Distance in Imperfect-Information Games](https://www.cs.cmu.edu/~sandholm/potential-aware_imperfect-recall.aaai14.pdf) (Sandholm, 2014)
+	- [Abstraction for Solving Large Incomplete-Information Games](https://www.cs.cmu.edu/~sandholm/game%20abstraction.aaai15SMT.pdf) (Sandholm, 2015)
+	- [Hierarchical Abstraction, Distributed Equilibrium Computation, and Post-Processing, with Application to a Champion No-Limit Texas Hold’em Agent](https://www.cs.cmu.edu/~noamb/papers/15-AAMAS-Tartanian7.pdf) (Brown, 2015)
+- Subgame Solving: This seems to be impossible to do on a local computer
+	- [Safe and Nested Subgame Solving for Imperfect-Information Games](https://arxiv.org/abs/1705.02955) (Brown, 2017)
+- Measuring the Size of Poker
+	- [Measuring the Size of Large No-Limit Poker Games](https://arxiv.org/pdf/1302.7008.pdf) (Johnson, 2013)
+- Evaluating the Performance of a Poker Agent
+	- [A TOOL FOR THE DIRECT ASSESSMENT OF POKER DECISIONS](https://poker.cs.ualberta.ca/publications/divat-icgaj.pdf) (Billings, 2006) 
+	- [Strategy Evaluation in Extensive Games with Importance Sampling](https://poker.cs.ualberta.ca/publications/ICML08.pdf) (Bowling, 2008)
+
+Poker Equity: https://www.pokernews.com/strategy/talking-poker-equity-21291.htm#:~:text=When%20you%20play%20poker%2C%20'Equity,at%20that%20moment%20is%20%2490.
+
+Other Links (Web Pages + Videos)
+- https://poker.cs.ualberta.ca/resources.html, this is really good https://poker.cs.ualberta.ca/general_information.html for general information
+- Poker Database: https://poker.cs.ualberta.ca/irc_poker_database.html
+- [The State of Techniques for Solving Large Imperfect-Information Games, Including Poker](https://www.youtube.com/watch?v=QgCxCeoW5JI&ab_channel=MicrosoftResearch) by Sandholm, really solid overview about abstractions of the game
+- [Superhuman AI for heads-up no-limit poker: Libratus beats top professionals](https://www.youtube.com/watch?v=2dX0lwaQRX0&t=2591s&ab_channel=NoamBrown) by Noam Brown
+	- [AI for Imperfect-Information Games: Beating Top Humans in No-Limit Poker](https://www.youtube.com/watch?v=McV4a6umbAY&ab_channel=MicrosoftResearch) by Noam Brown at Microsoft Research
+
+Poker Agents Papers
+- Slumbot "250,000 core hours and 2 TB of RAM to compute its strategy"
+- [Polaris](https://www.ifaamas.org/Proceedings/aamas09/pdf/06_Demos/d_11.pdf) (2008)
+- [Baby Tartanian 8](https://www.cs.cmu.edu/~sandholm/BabyTartanian8.ijcai16demo.pdf) (2016) "2 million core hours and 18 TB of RAM to compute its strategy"
+- [DeepStack](https://static1.squarespace.com/static/58a75073e6f2e1c1d5b36630/t/58b7a3dce3df28761dd25e54/1488430045412/DeepStack.pdf) (2017)
+- [Libratus](https://www.cs.cmu.edu/~noamb/papers/17-IJCAI-Libratus.pdf) (2017)
+	1. Blueprint Strategy (Full-Game Strategy) using MCCFR
+	2. Subgame Solving with CFR+
+	3. Adapt to opponent
+- [Pluribus](https://www.cs.cmu.edu/~noamb/papers/19-Science-Superhuman.pdf), video [here](https://www.youtube.com/watch?v=u90TbxK7VEA&ab_channel=TwoMinutePapers) (2019)
diff --git a/results/heuristics_slumbot_strategy_4_cfr_abstraction_baseline.joblib b/results/heuristics_slumbot_strategy_4_cfr_abstraction_baseline.joblib
diff --git a/results/heuristics_slumbot_strategy_4_cfr_abstraction_raw.joblib b/results/heuristics_slumbot_strategy_4_cfr_abstraction_raw.joblib
diff --git a/results/strategy0.png b/results/strategy0.png
diff --git a/results/strategy1.png b/results/strategy1.png
diff --git a/results/strategy2.png b/results/strategy2.png
diff --git a/results/strategy3.png b/results/strategy3.png
diff --git a/results/strategy4.png b/results/strategy4.png
diff --git a/slumbot/visualize.ipynb b/slumbot/visualize.ipynb