This repository contains code for simulating a cricket game as a Markov Decision Process (MDP). The game is played by two players, a batsman A and a batsman B, with the goal of maximizing the number of runs scored by A. The game is defined by a set of states, actions, and rewards, and can be solved using various algorithms for MDP planning such as value iteration, policy iteration, and linear programming.
The repository contains the following files:
Contains the implementation of the encoder class which is used to encode the game rules and parameters into the MDP transition and reward matrices.
Contains the implementation of the MDP class which is used to generate an episodic or continuing MDP.
Contains the implementation of the plan_my_MDP class which is used to solve the MDP using various algorithms such as value iteration, policy iteration and linear programming.
A sample script that reads an episodic MDP file and prints the transition and reward matrices
data/: A folder that contains the data files for the MDP.
data/mdp/: A folder that contains the episodic MDP files.
The game is encoded using the encoder.py file which contains the implementation of the encoder class. The class takes the following inputs:
playp: a text file that contains the probability distribution of the actions of the batsman A. states: a text file that contains a list of all possible states in the game. q : a parameter that
The main.py file is the entry point of the code. It takes in three command line arguments: the path to the MDP file, the path to the player parameters file, and the value of q. The MDP file is in the /data/mdp/ directory, and the player parameters file is in the /data/player_params/ directory.
The main.py file first creates an instance of the encoder class by passing the path to the player parameters file and the value of q. The encoder class reads the player parameters file and initializes the action-outcome matrix, the list of states, and the transition and reward matrices.
The main.py file then creates an instance of the generateMDP class by passing the path to the MDP file, the number of states, and the number of actions. The generateMDP class reads the MDP file and initializes the transition and reward matrices.
The main.py file then creates an instance of the plan_my_MDP class by passing the transition and reward matrices, the discount factor, and the type of MDP. The plan_my_MDP class calculates the value function and the policy for the given MDP using value iteration, policy iteration or linear programming.
The encoder.py file contains a class encoder which is used to encode the problem of scoring runs in a cricket match as an MDP. The class takes in the path to the player parameters file and the value of q as inputs. The player parameters file contains the probabilities of different outcomes for different actions taken by the player.
The generateMDP.py file is responsible for generating the MDPs used in the other files. The class defined in this file, MDP, creates an MDP with a certain number of states and actions, a certain type (episodic or continuing), and a certain discount factor (gamma). The MDP is created by randomly generating transition probabilities and rewards for each state and action combination. The generateEpisodicMDP function creates an episodic MDP, which is an MDP where there are specific states that are designated as "end states", and the agent's goal is to reach one of these end states. The generateContinuingMDP function creates a continuing MDP, which is an MDP where there are no designated end states and the agent's goal is to maximize the expected cumulative reward.
The planner.py file contains the plan_my_MDP class which is used to solve the MDPs generated by the generateMDP.py file. This class has three functions: vi, hpi, and lp. vi is used to solve the MDP using value iteration, hpi is used to solve the MDP using the Howard's policy iteration algorithm and lp is used to solve the MDP using Linear Programming. The class also has a constructor which reads the MDP file and stores the transition and reward matrices.
The reader.py file contains a script that demonstrates how to read an MDP file and store the transition and reward matrices. This file is mainly used for testing and understanding the structure of the MDP files.
decoder.py is a script that reads the output generated by the encoder.py script. encoder.py takes in 3 parameters, the player parameter file, the state file, and the q value and converts it into a mdp file. decoder.py reads this mdp file and converts it back to the original player parameter file and state file. The script does this by reading the mdp file, extracting the transition and reward matrices, and using these matrices to calculate the player parameter file and state file.