15 September 2019
The scripts were developed as a review of existing reinforcement learning tools.
It has 6 subclasses:
- matching_pennies;
- prisoner_dilemma;
- stag_hunt;
- RPS (Rock, Paper, and Scissors);
- hawk_dove;
- wrap.
The last class takes two numpy 2 dimensional arrays and returns a game with the arrays as payoff matrices.
All games have only 2 agents with equal number of actions.
All classes have 4 methods:
- agent_set - Returns set of agents;
- action_set - Returns set of actions;
- payoff - Returns payoff matrices of both agents;
- best_resp - Takes array of actions and returns the best responses.
It has only one function - PHC (Policy Hill Climbing). It includes both PHC and WoLF PHC (Win or Learn Fast).
The algorithms were taken from: Win or Learn Fast Policy Hill-Climbing algrorithm is taken from Bowling, M. and Veloso, M., 2002. Multiagent learning using a variable learning rate. Artificial Intelligence, 136(2), pp.215-250.
As arguments it takes:
- game;
- immediate reward;
- gamma (discount parameter);
- alpha_param (share of maximum Q in Q-value):
- it takes [b, a] and puts into alpha = 1/(b + a*t);
- variable t is current itiration.
- delta_param (change of policy):
- if WoLF, it takes 3 values [b, a, c], puts them into the delta for win strategy and into the delta for loss strategy.
- if not-WoLF, it takes 1 value as a fixed delta in [0,1].
- iter_max (maximum number of iterations);
- explore_rate (exploration rate in [0,1]);
- WoLF (turns on WoLF PHC when True).
Jupyter notebook example.ipynb includes a demonstrative execution code.