This repository aims to bundle all the code and experiments for my Master's thesis on online adaptation in multi-agent reinforcement learning (MARL). The thesis is jointly supervised by Jakob Foerster with support from his research group at FLAIR and Arnaud Doucet from the Department of Statistics at the University of Oxford.
The top-level scripts describe some of the basic functionality and structure of this project.
- 01_step_through_env.py shows how to initialize and step through an iterated lever environment with custom parameters and partner policies.
- 02_q_learning.py combines the environment with a learner of class
DQNAgent
to perform vanilla q-learning. - 03_es_meta_learning.py exemplifies how the
OpenES
class - which implements the evolution strategies algorithm Open-ES - can be used to learn initial network weights capable of remembering a fixed partner pattern of length three. - 04_es_learn_history_representations.py shows how evolution strategies can be used to learn the parameters of a LSTM giving a history representation suitable for effective q-learning.
- 05_learning_with_drqn.py exemplifies the adaptation baseline (a simple deep recurrent q-learner based on the work by Hausknecht et al.).
- 06_step_through_marl_env.py shows how to step through the iterated lever environment without a fixed partner policy, but a pair of (possibly learning) agents.