-
Notifications
You must be signed in to change notification settings - Fork 6
What's in the package?
The package consists of three parts, which are the agent base module, the utility module, and all of the standard agents. A specific agent, say DDPG, inherits the agent base class. The agent base class provides universal functions that can be used by any child-agent class. The utility module provides many components commonly seen in modern DRL algorithms, like exploration strategies, networks, replay buffers, etc. All of these are dedicated to facilitating fast implementations. This page will explain most of them. Note that they are still evolving.
The class can be accessed as: from drl_implementation.agent.agent_base import Agent
. It expects several arguments to initialize, you can find the doc strings in the source script. The __init__
function does the following things:
- set a cuda device and seeding pytorch
- create a universal numpy random number generator
- create file storage directories if necessary
- setup a replay buffer
- setup a normalizer for state normalization if desired
- store some universal algorithmic parameters, such as learning rates
- create a dictionary to store pytorch networks
- create a dictionary to store training statistics
It provides functions that can be used by any child-agent class, including:
-
_remember(...)
, for storing data into the replay buffer -
_soft_update(...)
, for updating target networks with polyak averaging -
_save_network(...)
and_load_network(...)
, for saving and loading pytorch network checkpoints -
_save_statistics(...)
, for saving the statistics dictionary using the json module -
_plot_statistics(...)
, for plotting training statistics using the matplotlib module
The above-mentioned functions are generally applicable to any child-agent class. However, there are four functions that a child-agent class must implement:
-
run(...)
, for running a whole experiment, calling functions to train, test, and manage, print, plot and save data. -
_interact(...)
, for interacting with the environment, collecting data, and most of the time, calling the_learn(...)
function -
_select_action(...)
, for producing actions, either exploratory or greedy ones -
_learn(...)
, for updating the agent's networks
For some specific codes about what to implement in these four functions, try open up an agent class script, say DDPG, and have a dive.