In this project we have a task to do a MARL (Multi-Agent Reinforcement Learning) Penalty Shot Challenge
by creating our own platform to pit SOTA Deep Reinforcement Learning algorithms against each other. It involves two agents simulating a penalty shootout. We have two entities that we would be playing on, The Bar
and The Puck
.
- This Project is done under Professor and Course of IIT Bhilai:
Professor: Soumajit Pramanik
Course: DS251
- Visualization of the unique environment at every step
- Complete customization of environments and policies
- Asynchronous server for manual interaction with a policy
The
-e
flag is included to make the project package editable
Login to
wandb.ai
to record your experimental runs
pip install -e .
pip install -e ./gym-env
wandb login
Use files in
utils/config/
to control configuration of agent specific policy hyper-parameters and environment parameters
Example command to run that trains a puck and bar with PPO algorithm and uses a previously saved policy for each of the agents with 1 training environment and 2 test environment
python ./utils/train.py --wandb-name "ds251_project" --training-num 1 --test-num 2 --puck ppo --bar ppo --load-puck-id both_ppo --load-bar-id both_ppo
Open 3 terminals and run
python ./examples/server/start_server.py
python ./examples/server/agent_puck.py
python ./examples/server/agent_bar.py
Click Start
and use the mouse slider to control the direction of the bar.
It comprises a puck and a bar, with the puck moving horizontally at a consistent speed towards the bar. Each entity is independently controlled by its respective agent. The objective for the puck is to surpass the bar and reach the final line, while the bar aims to intercept the puck before it reaches the final line.
The environment is constructed using the OpenAI Gym library, where two action parameters corresponding to the puck and bar are accepted. The game progresses by one time step, producing a tuple output of state, reward, completion state, and an additional information object. Back to TOC
lib-agents
: It features trivial, value based and policy based algorithms includingsmurve
,DQN
,TD3
,PPO
andDDPG
.comm-agents
: It implements the hardcoded approach for finding a baseline and pure exploration strategy. It also implements the mouse slider.- Also implements a
TwoAgentPolicyWrapper
to combine policies for the puck and the agent. Back to TOC
- Includes a training script and utility functions that implement wrappers.
- Holds information regarding policy and environment configurations. Back to TOC
To facilitate asynchronous inputs from agents, a central server has been developed to manage the environment. Agents utilize a client class to establish a connection with the server, employing its step function to submit their actions and receive the corresponding result tuple. The server processes actions from the agents, synchronizes them, and advances the environment by a single time step. Back to TOC
- Script for playing with the puck as a bar
- A notebook demonstrating smurves Back to TOC
Abhishek Kumar (12140040)
Arnav Gautam (12140280)
Dhruv Gupta (12140580)
Mitul Vardhan (12141070)
- We thank Prof. Soumajit Pramanik for providing us with this opportunity to explore and learn more about SOTA algorithms through a project.
- We also thank the creators of Tianshou and OpenAI Gym library which forms a core part of our codebase
- We thank the open source community for wonderful libraries for everything under the sun!