Skip to content

Commit

Permalink
Update documentation (#323)
Browse files Browse the repository at this point in the history
* update setup.py

* fix sphinx warnings

* edit handling of docs dependencies

* update docs to latest

* run formatter

* correct getting started doc

* remove script imports

* remove dev installation from .readthedocs.yml

* fix localhost link
  • Loading branch information
cpnota authored Mar 17, 2024
1 parent dbb0d96 commit ac4c444
Show file tree
Hide file tree
Showing 6 changed files with 29 additions and 30 deletions.
4 changes: 2 additions & 2 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
# -- Project information -----------------------------------------------------

project = 'autonomous-learning-library'
copyright = '2020, Chris Nota'
copyright = '2024, Chris Nota'
author = 'Chris Nota'

# The full version, including alpha/beta/rc tags
Expand Down Expand Up @@ -72,4 +72,4 @@
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# html_static_path = ['_static']
25 changes: 12 additions & 13 deletions docs/source/guide/basic_concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -160,11 +160,11 @@ A few other quick things to note: ``f.no_grad(x)`` runs a forward pass with ``to
``f.target(x)`` calls the *target network* (an advanced concept used in algorithms such as DQN. For example, David Silver's `course notes <http://www0.cs.ucl.ac.uk/staff/d.silver/web/Talks_files/deep_rl.pdf>`_) associated with the ``Approximation``, also with ``torch.no_grad()``.
The ``autonomous-learning-library`` provides a few thin wrappers over ``Approximation`` for particular purposes, such as ``QNetwork``, ``VNetwork``, ``FeatureNetwork``, and several ``Policy`` implementations.

Environments
------------
ALL Environments
----------------

The importance of the ``Environment`` in reinforcement learning nearly goes without saying.
In the ``autonomous-learning-library``, the prepackaged environments are simply wrappers for `OpenAI Gym <http://gym.openai.com>`_, the defacto standard library for RL environments.
In the ``autonomous-learning-library``, the prepackaged environments are simply wrappers for `Gymnasium <https://gymnasium.farama.org>`_ (formerly OpenAI Gym), the defacto standard library for RL environments.

.. figure:: ./ale.png

Expand All @@ -173,15 +173,15 @@ In the ``autonomous-learning-library``, the prepackaged environments are simply

We add a few additional features:

1) ``gym`` primarily uses ``numpy.array`` for representing states and actions. We automatically convert to and from ``torch.Tensor`` objects so that agent implemenetations need not consider the difference.
1) ``gymnasium`` primarily uses ``numpy.array`` for representing states and actions. We automatically convert to and from ``torch.Tensor`` objects so that agent implemenetations need not consider the difference.
2) We add properties to the environment for ``state``, ``reward``, etc. This simplifies the control loop and is generally useful.
3) We apply common preprocessors, such as several standard Atari wrappers. However, where possible, we prefer to perform preprocessing using ``Body`` objects to maximize the flexibility of the agents.

Below, we show how several different types of environments can be created:

.. code-block:: python
from all.environments import AtariEnvironment, GymEnvironment, PybulletEnvironment
from all.environments import AtariEnvironment, GymEnvironment, MujocoEnvironment
# create an Atari environment on the gpu
env = AtariEnvironment('Breakout', device='cuda')
Expand All @@ -190,7 +190,7 @@ Below, we show how several different types of environments can be created:
env = GymEnvironment('CartPole-v0')
# create a PyBullet environment on the cpu
env = PybulletEnvironment('cheetah')
env = MujocoEnvironment('HalfCheetah-v4')
Now we can write our first control loop:

Expand All @@ -216,8 +216,8 @@ Of course, this control loop is not exactly feature-packed.
Generally, it's better to use the ``Experiment`` module described later.


Presets
-------
ALL Presets
-----------

In the ``autonomous-learning-library``, agents are *compositional*, which means that the behavior of a given ``Agent`` depends on the behavior of several other objects.
Users can compose agents with specific behavior by passing appropriate objects into the constructor of the high-level algorithms contained in ``all.agents``.
Expand Down Expand Up @@ -274,8 +274,8 @@ If a ``Preset`` is loaded from disk, then we can instansiate a test ``Agent`` us



Experiment
----------
ALL Experiments
---------------

Finally, we have all of the components necessary to introduce the ``run_experiment`` helper function.
``run_experiment`` is the built-in control loop for running reinforcement learning experiment.
Expand All @@ -284,7 +284,6 @@ Here is a quick example:

.. code-block:: python
from gym import envs
from all.experiments import run_experiment
from all.presets import atari
from all.environments import AtariEnvironment
Expand Down Expand Up @@ -313,7 +312,7 @@ You can view the results in ``tensorboard`` by running the following command:
tensorboard --logdir runs
In addition to the ``tensorboard`` logs, every 100 episodes, the mean and standard deviation of the previous 100 episode returns are written to ``runs/[agent]/[env]/returns100.csv``.
In addition to the ``tensorboard`` logs, every 100 episodes, the mean, standard deviation, min, and max of the previous 100 episode returns are written to ``runs/[agent]/[env]/returns100.csv``.
This is much faster to read and plot than Tensorboard's proprietary format.
The library contains an automatically plotting utility that generates appropriate plots for an *entire* ``runs`` directory as follows:

Expand All @@ -324,7 +323,7 @@ The library contains an automatically plotting utility that generates appropriat
This will generate a plot that looks like the following (after tweaking the whitespace through the ``matplotlib`` UI):

.. image:: ../../../benchmarks/atari40.png
.. image:: ../../../benchmarks/atari_40m.png

An optional parameter is ``test_episodes``, which is set to 100 by default.
After running for the given number of frames, the agent will be evaluated for a number of episodes specified by ``test_episodes`` with training disabled.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/guide/benchmark_performance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ our agents achieved very similar behavior to the agents tested by DeepMind.
MuJoCo Benchmark
------------------

`MuJoCo https://mujoco.org`_ is "a free and open source physics engine that aims to facilitate research and development in robotics, biomechanics, graphics and animation, and other areas where fast and accurate simulation is needed."
`MuJoCo <https://mujoco.org>`_ is "a free and open source physics engine that aims to facilitate research and development in robotics, biomechanics, graphics and animation, and other areas where fast and accurate simulation is needed."
The MuJoCo Gym environments are a common benchmark in RL research for evaluating agents with continuous action spaces.
We ran each continuous preset for 5 million timesteps (in this case, timesteps are equal to frames).
The learning rate was decayed over the course of training using cosine annealing.
Expand Down
16 changes: 8 additions & 8 deletions docs/source/guide/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ Getting Started
Prerequisites
-------------

The Autonomous Learning Library requires a recent version of PyTorch (~=1.8.0 recommended).
The Autonomous Learning Library requires a recent version of PyTorch (at least v2.2.0 is recommended).
Additionally, Tensorboard is required in order to enable logging.
We also strongly recommend using a machine with a fast GPU (at minimum a GTX 970 or better, a GTX 1080ti or better is preferred).
We also strongly recommend using a machine with a fast GPU with at least 11 GB of VRAM (a GTX 1080ti or better is preferred).

Installation
------------
Expand Down Expand Up @@ -35,7 +35,7 @@ An alternate approach, that may be useful when following this tutorial, is to in
cd autonomous-learning-library
pip install -e .[dev]
``dev`` will install all of the optional dependencies for developers of the repo, such as unit test and documentation dependencies, as well as all environments.
``dev`` will install all of the optional dependencies for developers of the repo, such as unit test dependencies, as well as all environments.
If you chose to clone the repository, you can test your installation by running the unit test suite:

.. code-block:: bash
Expand All @@ -50,20 +50,20 @@ Running a Preset Agent
The goal of the Autonomous Learning Library is to provide components for building new agents.
However, the library also includes a number of "preset" agent configurations for easy benchmarking and comparison,
as well as some useful scripts.
For example, a PPO agent can be run on Cart-Pole as follows:
For example, an a2c agent can be run on CartPole as follows:

.. code-block:: bash
all-classic CartPole-v0 a2c
The results will be written to ``runs/a2c_<COMMIT>_<DATETIME>``, where ``<COMMIT>`` and ``<DATATIME>`` are strings generated by the library.
The results will be written to ``runs/a2c_CartPole-v0_<DATETIME>``, ``<DATATIME>`` is generated by the library.
You can view these results and other information through `tensorboard`:

.. code-block:: bash
tensorboard --logdir runs
By opening your browser to <http://localhost:6006>, you should see a dashboard that looks something like the following (you may need to adjust the "smoothing" parameter):
By opening your browser to `http://localhost:6006`_, you should see a dashboard that looks something like the following (you may need to adjust the "smoothing" parameter):

.. image:: tensorboard.png

Expand All @@ -84,9 +84,9 @@ Finally, to watch the trained model in action, we provide a `watch` scripts for

.. code-block:: bash
all-watch-classic CartPole-v0 runs/a2c_<COMMIT>_<DATETIME>/preset.pt
all-watch-classic CartPole-v0 runs/a2c_CartPole-v0_<DATETIME>/preset.pt
You need to find the <id> by checking the ``runs`` directory.

Each of these scripts can be found the ``scripts`` directory of the main repository.
Be sure to check out the ``atari`` and ``continuous`` scripts for more fun!
Be sure to check out the ``all-atari`` and ``all-mujoco`` scripts for more fun!
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Enjoy!
guide/benchmark_performance

.. toctree::
:maxdepth: 4
:maxdepth: 1
:caption: Modules:

modules/agents
Expand Down
10 changes: 5 additions & 5 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,17 +26,17 @@
"torch-testing==0.0.2", # pytorch assertion library
],
"docs": [
"sphinx~=3.2.1", # documentation library
"sphinx-autobuild~=2020.9.1", # documentation live reload
"sphinx-rtd-theme~=0.5.0", # documentation theme
"sphinx-automodapi~=0.13.0", # autogenerate docs for modules
"sphinx~=7.2.6", # documentation library
"sphinx-autobuild~=2024.2.4", # documentation live reload
"sphinx-rtd-theme~=2.0.0", # documentation theme
"sphinx-automodapi~=0.17.0", # autogenerate docs for modules
],
}

extras["all"] = (
extras["atari"] + extras["mujoco"] + extras["pybullet"] + extras["ma-atari"]
)
extras["dev"] = extras["all"] + extras["test"] + extras["docs"]
extras["dev"] = extras["all"] + extras["test"]

setup(
name="autonomous-learning-library",
Expand Down

0 comments on commit ac4c444

Please sign in to comment.