diff --git a/notebooks/AdversarialSearch.ipynb b/notebooks/AdversarialSearch.ipynb new file mode 100644 index 0000000000..9a56bbe319 --- /dev/null +++ b/notebooks/AdversarialSearch.ipynb @@ -0,0 +1,482 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Adversarial Search" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This notebook serves as supporting material for the chapter Adversarial Search. Mathematical **game theory** views any multiagent environment as a game, provided that impact of each agent on the other is \"significant\", regardless of whether the agents are cooperative or competitive. In this chapter we cover **competitive** environments, in which the agents' goals are in conflict, giving rise to **adversarial search** problems-often known as **games**. The discussion begins with a definition of the optimal move and an algorithm for finding it. We then look at techniques for choosing a good move when the time is limited. " + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "%classpath add jar ../out/artifacts/aima_core_jar/aima-core.jar" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Games\n", + "\n", + "We first consider games with 2 players, whom we call MAX and MIN. MAX moves first, and then they take turns moving until the game is over.\n", + "Now let's formally define a game. According to the textbook, a game can be formally defined as a kind of search problems with the following elements:\n", + "* $S_0$: The **initial state**, which specifies how the game is set up at the start.\n", + "* $PLAYER(s)$: Defines which player has the move in a state.\n", + "* $ACTIONS(s)$: Returns the set of legal moves in a state.\n", + "* $RESULT(s,a)$: The **transition model**, which defines the result of a move.\n", + "* $TERMINAL$-$TEST(s)$: A terminal test, which is true when the game is over and false otherwise. States, where the game has ended, are called **terminal states**.\n", + "* $UTILITY(s,p)$: A utility function defines the final numeric value for a game that ends in terminal state $s$ for a player $p$. For example, in chess, the outcome is a win, lose, or draw, with values +1, 0, or 1/2.\n", + "\n", + "This six component structure is implemented as an interface named [Game.java](https://github.com/aimacode/aima-java/blob/AIMA3e/aima-core/src/main/java/aima/core/search/adversarial/Game.java) in the repository. Let's have a look at the implementation\n", + "\n", + "````java\n", + "public interface Game {\n", + "\n", + " S getInitialState();\n", + "\n", + " P[] getPlayers();\n", + "\n", + " P getPlayer(S state);\n", + "\n", + " List getActions(S state);\n", + "\n", + " S getResult(S state, A action);\n", + "\n", + " boolean isTerminal(S state);\n", + "\n", + " double getUtility(S state, P player);\n", + "}\n", + "````\n", + "\n", + "The states, actions and players are represented by the generic variables `S`, `A` and `P` respectively. Clearly, the methods represent the\n", + "six components of a particular problem as follows:\n", + "* initial state ← `getInitialState()`\n", + "* player having the move at current state ← `getPlayer(S state)`\n", + "* applicable actions ← `getActions(S state)`\n", + "* the transition model ← `getResult(S state, A action)`\n", + "* the terminal test ← `isTerminal(S state)`\n", + "* utility function ← `getUtility(S state, P player)`\n", + "\n", + "The initial state, ACTIONS function, and RESULT function define the **game tree** for the game-a tree where the nodes are game states and edges are moves. Note that, regardless of the size of the game tree, it is MAX's job to search for a good move. We use the term **search tree** for a tree that is superimposed on the full game tree, and examines enough nodes to allow a player to determine what move to make." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Optimal Decision in Games\n", + "\n", + "In Adversarial search, MIN's move impacts MAX's next move. MAX, therefore, must find a contingent strategy, which specifies MAX's move in the initial state, then MAX's moves in the states resulting from every possible response by MIN, then MAX's moves in the states resulting from every possible response by MIN to those moves, and so on. Let's begin with \"How to find this optimal strategy?\"\n", + "\n", + "Given a game tree, the optimal strategy can be determined from the **minimax value** of each node, which we write as $MINIMAX(n)$. The minimax value of a node is the utility (for MAX) of being in the corresponding state, assuming that both players play optimally from there to the end of the game. Obviously, the minimax value of the terminal state is just its utility. Furthermore, given a choice, MAX prefers to move to a state of the maximum value, whereas MIN prefers a state of minimum value. Therefore:\n", + "\n", + "$MINIMAX(s) = \\left\\{\\begin{matrix}\n", + "UTILITY(s) & if & TERMINAL-TEST(s) \\\\ \n", + "max{_{a\\in Actions(s)}MINIMAX(RESULT(s,a))} & if & PLAYER(s) = MAX\\\\ \n", + "min{_{a\\in Actions(s)}MINIMAX(RESULT(s,a))} & if & PLAYER(s) = MIN \n", + "\\end{matrix}\\right.$" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This definition of optimal play for MAX assumes that MIN also plays optimally-it maximizes the worst case outcome for MAX. If MIN does not play optimally then MAX will do even better." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### The Minimax Algorithm\n", + "\n", + "The minimax algorithm computes the minimax decision from the current state. It uses a simple recursive computation of the minimax values of each successor state. The recursion proceeds all the way down to the leaves of the game tree and then the minimax values are backed up through the tree as the recursion unwinds. Therefore, it performs a complete depth-first exploration of the game tree. If the maximum depth of the tree is $m$ and there are $b$ legal moves at each point, then the time complexity of the minimax algorithm is $O(b^m)$.\n", + "\n", + "Let's have a look at the pseudo code of minimax decision." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/markdown": [ + "### AIMA3e\n", + "__function__ MINIMAX-DECISION(_state_) __returns__ _an action_ \n", + " __return__ arg max _a_ ∈ ACTIONS(_s_) MIN\\-VALUE(RESULT(_state_, _a_)) \n", + "\n", + "---\n", + "__function__ MAX\\-VALUE(_state_) __returns__ _a utility value_ \n", + " __if__ TERMINAL\\-TEST(_state_) __then return__ UTILITY(_state_) \n", + " _v_ ← −∞ \n", + " __for each__ _a_ __in__ ACTIONS(_state_) __do__ \n", + "   _v_ ← MAX(_v_, MIN\\-VALUE(RESULT(_state_, _a_))) \n", + " __return__ _v_ \n", + "\n", + "---\n", + "__function__ MIN\\-VALUE(_state_) __returns__ _a utility value_ \n", + " __if__ TERMINAL\\-TEST(_state_) __then return__ UTILITY(_state_) \n", + " _v_ ← ∞ \n", + " __for each__ _a_ __in__ ACTIONS(_state_) __do__ \n", + "   _v_ ← MIN(_v_, MAX\\-VALUE(RESULT(_state_, _a_))) \n", + " __return__ _v_ \n", + "\n", + "---\n", + "__Figure__ ?? An algorithm for calculating minimax decisions. It returns the action corresponding to the best possible move, that is, the move that leads to the outcome with the best utility, under the assumption that the opponent plays to minimize utility. The functions MAX\\-VALUE and MIN\\-VALUE go through the whole game tree, all the way to the leaves, to determine the backed\\-up value of a state. The notation argmax _a_ ∈ _S_ _f_(_a_) computes the element _a_ of set _S_ that has maximum value of _f_(_a_).\n", + "\n", + "---\n", + "__function__ EXPECTIMINIMAX(_s_) = \n", + " UTILITY(_s_) __if__ TERMINAL\\-TEST(_s_) \n", + " max_a_ EXPECTIMINIMAX(RESULT(_s, a_)) __if__ PLAYER(_s_)= MAX \n", + " min_a_ EXPECTIMINIMAX(RESULT(_s, a_)) __if__ PLAYER(_s_)= MIN \n", + " ∑_r_ P(_r_) EXPECTIMINIMAX(RESULT(_s, r_)) __if__ PLAYER(_s_)= CHANCE" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "%%python\n", + "from notebookUtils import *\n", + "pseudocode('Minimax Decision')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Above pseudo code is implemented [here](https://github.com/aimacode/aima-java/blob/AIMA3e/aima-core/src/main/java/aima/core/search/adversarial/MinimaxSearch.java) in the code repository. Now let's try to analyze the minimax decision in a Tic-Tac-toe game." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "MINI MAX DEMO\n", + "\n", + "X playing ... \n", + "X - - \n", + "- - - \n", + "- - - \n", + "\n", + "O playing ... \n", + "X - - \n", + "- O - \n", + "- - - \n", + "\n", + "X playing ... \n", + "X - - \n", + "X O - \n", + "- - - \n", + "\n", + "O playing ... \n", + "X - - \n", + "X O - \n", + "O - - \n", + "\n", + "X playing ... \n", + "X - X \n", + "X O - \n", + "O - - \n", + "\n", + "O playing ... \n", + "X O X \n", + "X O - \n", + "O - - \n", + "\n", + "X playing ... \n", + "X O X \n", + "X O - \n", + "O X - \n", + "\n", + "O playing ... \n", + "X O X \n", + "X O O \n", + "O X - \n", + "\n", + "X playing ... \n", + "X O X \n", + "X O O \n", + "O X X \n", + "\n" + ] + }, + { + "data": { + "text/plain": [ + "null" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import aima.core.environment.tictactoe.TicTacToeGame;\n", + "import aima.core.environment.tictactoe.TicTacToeState;\n", + "import aima.core.search.adversarial.AdversarialSearch;\n", + "import aima.core.search.adversarial.MinimaxSearch;\n", + "import aima.core.util.datastructure.XYLocation;\n", + "\n", + "System.out.println(\"MINI MAX DEMO\\n\");\n", + "TicTacToeGame game = new TicTacToeGame();\n", + "TicTacToeState currState = game.getInitialState();\n", + "AdversarialSearch search = MinimaxSearch\n", + " .createFor(game);\n", + "while (!(game.isTerminal(currState))) {\n", + " System.out.println(game.getPlayer(currState) + \" playing ... \");\n", + " XYLocation action = search.makeDecision(currState);\n", + " currState = game.getResult(currState, action);\n", + " System.out.println(currState);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Optimal decisions in multiplayer games\n", + "\n", + "Here we need to replace the single value for each node with a _vector_ of values. For example, in a 3-player game with players A, B, and C, a vector $$ is associated with each node. For terminal states, this vector gives the utility of the state from each player's viewpoint. The simplest way to implement this is to have the $UTILITY$ function return a vector of utilities. Hence the backed-up value of a node $n$ is always the utility vector of the successor state with the highest value for the player playing at $n$. It is important to notice that multiplayer games usually involve **alliances** among the players. Alliances are made and broken as the game proceeds. For example, suppose A and B are in a weak position and C is in a stronger position. then it is often optimal for both A and B to attack C rather than each other, lest C destroy each of them individually. In this way, collaboration emerges with purely selfish behavior. Of course, as soon as C weakens, the alliance loses its value, and either A or B could violate the agreement. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Alpha-Beta Pruning\n", + "\n", + "The problem with minimax search is that the number of game states it has to examine is exponential in the depth of the tree. Unfortunately, we can't eliminate the exponent, but we can effectively cut it in half. The trick is that it is possible to compute the correct minimax decision without looking at every node in the game tree. The particular technique, called **alpha-beta pruning**, when applied to a standard minimax tree, returns the same move as the minimax would, but prunes away the branches that cannot influence the final decision. Alpha-beta pruning can be applied to trees of any depth, and it is often possible to prune entire subtrees rather than just leaves. The general principle is: consider a node $n$ somewhere in the tree such that player has a choice of moving to that node. If the player has a better choice $m$ either at the parent node of $n$ or at any choice-point further up, then _n will never be reached in actual play_. So once we have found out enough about $n$ to reach this conclusion, we can prune it.\n", + "\n", + "Alpha and Beta are the 2 parameters that describe bounds on the backed-up value that appear anywhere along the path:\n", + "* $\\alpha$ = the highest value choice we have found so far at any choice-point along the path i.e. value of the best choice found so far for MAX\n", + "* $\\beta$ = the lowest value choice we have found so far at any choice-point along the path i.e. value of the best choice found so far for MIN\n", + "\n", + "Alpha-beta search updates the value of $\\alpha$ and $\\beta$ as it goes along and prunes the remaining branch at a node as soon as the vvalue of the current node is known to be worse than the current $\\alpha$ or $\\beta$ value for MAX and MIN respectively." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's have a look at the pseudo code of Alpha-Beta Search. " + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/markdown": [ + "### AIMA3e\n", + "__function__ ALPHA-BETA-SEARCH(_state_) __returns__ an action \n", + " _v_ ← MAX\\-VALUE(_state_, −∞, +∞) \n", + " __return__ the _action_ in ACTIONS(_state_) with value _v_ \n", + "\n", + "---\n", + "__function__ MAX\\-VALUE(_state_, _α_, _β_) __returns__ _a utility value_ \n", + " __if__ TERMINAL\\-TEST(_state_) __then return__ UTILITY(_state_) \n", + " _v_ ← −∞ \n", + " __for each__ _a_ __in__ ACTIONS(_state_) __do__ \n", + "   _v_ ← MAX(_v_, MIN\\-VALUE(RESULT(_state_, _a_), _α_, _β_)) \n", + "   __if__ _v_ ≥ _β_ __then return__ _v_ \n", + "   _α_ ← MAX(_α_, _v_) \n", + " __return__ _v_ \n", + "\n", + "---\n", + "__function__ MIN\\-VALUE(_state_, _α_, _β_) __returns__ _a utility value_ \n", + " __if__ TERMINAL\\-TEST(_state_) __then return__ UTILITY(_state_) \n", + " _v_ ← +∞ \n", + " __for each__ _a_ __in__ ACTIONS(_state_) __do__ \n", + "   _v_ ← MIN(_v_, MAX\\-VALUE(RESULT(_state_, _a_), _α_, _β_)) \n", + "   __if__ _v_ ≤ _α_ __then return__ _v_ \n", + "   _β_ ← MIN(_β_, _v_) \n", + " __return__ _v_ \n", + "\n", + "\n", + "---\n", + "__Figure__ ?? The alpha\\-beta search algorithm. Notice that these routines are the same as the MINIMAX functions in Figure ??, except for the two lines in each of MIN\\-VALUE and MAX\\-VALUE that maintain _α_ and _β_ (and the bookkeeping to pass these parameters along)." + ], + "text/plain": [ + "" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "%%python\n", + "from notebookUtils import *\n", + "pseudocode('Alpha Beta Search')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Above pseudo code is implemented [here](https://github.com/aimacode/aima-java/blob/AIMA3e/aima-core/src/main/java/aima/core/search/adversarial/AlphaBetaSearch.java). Let's analyze the states of TicTacToe game played using Alpha-Beta search." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "ALPHA BETA DEMO\n", + "\n", + "X playing ... \n", + "X - - \n", + "- - - \n", + "- - - \n", + "\n", + "O playing ... \n", + "X - - \n", + "- O - \n", + "- - - \n", + "\n", + "X playing ... \n", + "X - - \n", + "X O - \n", + "- - - \n", + "\n", + "O playing ... \n", + "X - - \n", + "X O - \n", + "O - - \n", + "\n", + "X playing ... \n", + "X - X \n", + "X O - \n", + "O - - \n", + "\n", + "O playing ... \n", + "X O X \n", + "X O - \n", + "O - - \n", + "\n", + "X playing ... \n", + "X O X \n", + "X O - \n", + "O X - \n", + "\n", + "O playing ... \n", + "X O X \n", + "X O O \n", + "O X - \n", + "\n", + "X playing ... \n", + "X O X \n", + "X O O \n", + "O X X \n", + "\n" + ] + }, + { + "data": { + "text/plain": [ + "null" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import aima.core.environment.tictactoe.TicTacToeGame;\n", + "import aima.core.environment.tictactoe.TicTacToeState;\n", + "import aima.core.search.adversarial.AdversarialSearch;\n", + "import aima.core.search.adversarial.AlphaBetaSearch;\n", + "import aima.core.util.datastructure.XYLocation;\n", + "\n", + "System.out.println(\"ALPHA BETA DEMO\\n\");\n", + "TicTacToeGame game = new TicTacToeGame();\n", + "TicTacToeState currState = game.getInitialState();\n", + "AdversarialSearch search = AlphaBetaSearch\n", + " .createFor(game);\n", + "while (!(game.isTerminal(currState))) {\n", + " System.out.println(game.getPlayer(currState) + \" playing ... \");\n", + " XYLocation action = search.makeDecision(currState);\n", + " currState = game.getResult(currState, action);\n", + " System.out.println(currState);\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Entire game tree and the optimal decision making in Tic-Tac-Toe game can be visualized [here](http://aimacode.github.io/aima-javascript/5-Adversarial-Search/). " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Java", + "language": "java", + "name": "java" + }, + "language_info": { + "codemirror_mode": "text/x-java", + "file_extension": ".java", + "mimetype": "", + "name": "Java", + "nbconverter_exporter": "", + "version": "1.8.0_144" + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": false, + "sideBar": false, + "skip_h1_title": false, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": false, + "toc_position": {}, + "toc_section_display": false, + "toc_window_display": false + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}