This document serves as a quick overview on usage and features of the Hearts Gym.
An multi-agent environment to train agents on playing the Hearts card game.
Also includes a client-server architecture to remotely evaluate local agents.
Finally, any number of hard-coded baseline agents may be implemented with ease.
The rules are mostly implemented as specified by the modern rules from Morehead (2001) (ISBN: 9780451204844). For a more detailed description of the rules and differences from the original, execute the following:
python -m pydoc hearts_gym.envs.hearts_game.HeartsGame
Supported Python versions are shown in setup.py
under the
python_requires
argument. If your system does not have the correct
version (it will complain at some point during the installation), you
can use the Conda installation instructions.
Clone this repository so you can easily modify it, replacing
<repo-uri>
with the URI of this repo.
git clone <repo-uri>
cd hearts-gym
# If `python3` is not found, try `python`.
# If the `venv` module is not found, please install it.
python3 -m venv --system-site-packages ./env
# On Unix:
source ./env/bin/activate
# On Windows:
.\env\Scripts\activate
# Do not use `python3` from this point onward!
Install at least one of PyTorch or TensorFlow as your deep learning framework (RLlib also has experimental JAX support if you feel adventurous).
After installing a deep learning framework, in the root directory of the repository clone, execute:
python -m pip install --upgrade pip
python -m pip install -e .
You are done! Head over to the usage section.
To install a Python version different from your system's, below you
can find instructions for
Miniconda. If you
already have the conda
command available, you do not need to install
Miniconda.
After installing Miniconda, execute the following:
conda create -n hearts-gym python=3.8 zlib
conda activate hearts-gym
You now have a Conda environment for development. However, you still need to install the requirements.
You will need to execute one of the following lines each time you start a new shell. This will activate the Python virtual environment we are using:
# On Unix:
source ./env/bin/activate
# On Windows:
.\env\Scripts\activate
# If using Conda installation:
conda activate hearts-gym
We use RLlib with the recommended Tune API to manage training experiments.
The main script for starting a training run is train.py
, started
like this:
python train.py
If you encounter memory errors, the simplest solution is to set a
lower number of worker processes ('num_workers'
in the config
dictionary). By default, all CPUs and all GPUs are used.
If everything worked correctly, you should see a table summarizing test results of your learned agent against other agents printed on your terminal. If you see the table, you can ignore any other errors displayed by Ray. The table looks something like this:
[...]
(pid=10101) SystemExit: 1 # Can be ignored.
[...]
testing took 1.23456789 seconds
# illegal action (player 0): 0 / 52
# illegal action ratio (player 0): 0.0
| policy | # rank 1 | # rank 2 | # rank 3 | # rank 4 | total penalty |
|---------+----------+----------+----------+----------+---------------|
| learned | 1 | 0 | 0 | 0 | 0 |
| random | 0 | 1 | 0 | 0 | 5 |
| random | 0 | 1 | 0 | 0 | 5 |
| random | 0 | 0 | 0 | 1 | 16 |
The table lists the policy, the number of placements in each rank, and the accumulated penalty over all test games for each player. As you can see from this example with a single test game, players with the same penalty score get the highest of their rankings.
A central role in train.py
is played by the file configuration.py
.
configuration.py
contains lots of configuration options which are
described here. Results including configuration and
checkpoints are saved in the results
directory by default. You can
list directories containing checkpoints with python show_checkpoint_dirs.py
. When you want to share your checkpoints,
check out the corresponding section. After
training, your agent is automatically evaluated as well.
To optimize your agent, the main thing you want to modify is the
hearts_gym.RewardFunction.compute_reward
method in
hearts_gym/envs/reward_function.py
with which you can shape the
reward function for your agent, adjusting its behaviour. Variables and
functions that may help you during this step are described in
docs/reward-shaping.md
.
You should not modify the observations of the environment because we
rely on their structure for remote interaction. If you do decide to
modify them, you need to apply the same transformations in the
eval_agent.py
script so that the observations received from the
remote server match what your trained model expects.
Aside from the local evaluation in train.py
, you can start a server
and connect to it with different clients. You may want to configure
the variables SERVER_ADDRESS
and PORT
in
hearts_gym/envs/hearts_server.py
to obtain sensible defaults.
To start the server, execute the following:
python start_server.py --num_parallel_games 16
To connect to the server for evaluation, execute the following:
python eval_agent.py --name <name> --algorithm <algo> <checkpoint_path>
Replace <name>
with a name you want to have displayed, <algo>
with
the name of the algorithm you used for training the agent, and
<checkpoint_path>
with the path to a checkpoint. The rest of the
configuration is loaded from the params.pkl
file next to the
checkpoint's directory; if that file is missing, you have to configure
configuration.py
according to the checkpoint you are loading. Here
is an example:
python eval_agent.py --name '🂭-collector' --algorithm PPO results/PPO/PPO_Hearts-v0_00000_00000_0_1970-01-01_00-00-00/checkpoint_000002/checkpoint-2
Since the server will wait until enough players are connected, you
should either execute the eval_agent.py
script multiple times in
different shells or allow the server to use simulated agents. When a
client disconnects during games, they will be replaced with a randomly
acting agent.
The evaluation statistics are currently not communicated to the clients, so either log them on the client or check the server output for more information.
In configuration.py
, you will find several configuration options and
dictionaries such as stop_config
, model_config
or the main
config
. These are used to configure RLlib; possible options and
default values can be found at the following locations:
Configuration | Textual | Code |
---|---|---|
env_config |
python -m pydoc hearts_gym.HeartsEnv.__init__ |
hearts_gym/envs/hearts_env.py |
model_config |
RLlib Models | ray/rllib/models/catalog.py |
config |
RLlib Training | ray/rllib/agents/trainer.py |
Algorithm-specific config |
RLlib Algorithms (bottom of each algorithm's section) | ray/rllib/agents/<algo>/<algo>.py (replace <algo> with the algorithm's name) |
stop_config |
Tune Guide | ray/python/ray/tune/tune.py |
There is no pre-existing rule-based agent; the default one may be
implemented in the file
hearts_gym/policies/rule_based_policy_impl.py
by implementing the
compute_action
method. It is available under the policy ID
"rulebased" by default.
For more information on this topic including what to look out for and
how to implement multiple rule-based agents, refer to
docs/rule_based_policy.md
As Ray takes quite some time to initialize, for a faster development
workflow, you can use the mypy
typechecker. To check types for the
train.py
script, execute the following:
mypy train.py
mypy
gives several helpful hints; types not matching may be an
indicator for an issue.
This is completely optional and not required in any way to work on the code. Whether you want to use type hints and type checking is entirely up to your preference.
For less clutter and an easier debugging setup, set the num_gpus
and
num_workers
configuration values to 0.
See the list of RLlib algorithms.
You can filter algorithms via the following rules:
- We act in a discrete action space, so we require "Discrete Actions → Yes".
- We have a multi-agent environment, so we require "Multi-Agent → Yes".
- Action masking is supported for algorithms with the "Discrete Actions → Yes +parametric" label.
- Auto-wrapping models with an LSTM requires "Model Support" for "+RNN, +LSTM auto-wrapping"
- Auto-wrapping models with an Attention function requires "Model Support" for "+LSTM auto-wrapping, +Attention".
Some models require special settings or a re-implementation to support
action masking. For example, when using the DQN algorithm, the
hiddens
configuration option is automatically set to an empty list
when setting up action masking. When you try out another algorithm
with action masking and it fails for a weird reason, you may have to
modify its settings or re-implement it alltogether with explicit
support.
When action masking is enabled (mask_actions = True
) together with
model auto-wrapping (whether model['use_lstm']
or
model['use_attention']
does not matter), you will notice the
respective auto-wrapping configuration option will be set to False
during training setup.
This is required for our action masking wrappers to work; however, the
model['custom_model']
configuration option will have either _lstm
or _attn
appended to it when the model is wrapped in an LSTM or
Attention function, respectively.
Remember, this behavior only occurs when action masking is enabled!
To share a checkpoint, you need the whole directory containing the
checkpoint file (as listed by python show_checkpoint_dirs.py
). In
addition, you may want to share the params.pkl
file next to the
directory containing the checkpoint to share its configuration as
well.
RLlib automatically creates TensorBoard summaries, allowing you to monitor statistics of your models during (or after) training. Start it with the following line:
tensorboard --logdir results
Note that usage of this is completely optional; TensorBoard is not an installation requirement.
Be default, the eval_agent.py
script automatically loads parameters
used for model training from a pickle
file so the script does not
have to be re-configured for each checkpoint. If you obtain
checkpoints that include a params.pkl
file from an untrusted source
and load them, arbitrary code may be executed.
To avoid this security issue, set allow_pickles = False
in
configuration.py
. Note that you then have to configure
configuration.py
for each checkpoint you want to load in
eval_agent.py
so the configuration matches.
- General:
- Hearts:
- Environments:
- Policies:
- Models: