Skip to content

Commit

Permalink
[Doc] Document losses in README.md
Browse files Browse the repository at this point in the history
ghstack-source-id: b75d4e08349532b001c91ea3ae5f1e796de26ec5
Pull Request resolved: #2408
  • Loading branch information
vmoens committed Sep 12, 2024
1 parent b1eedbf commit b67d893
Showing 1 changed file with 273 additions and 13 deletions.
286 changes: 273 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -523,19 +523,279 @@ If you would like to contribute to new features, check our [call for contributio
## Examples, tutorials and demos

A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are provided with an illustrative purpose:
- [DQN](https://github.com/pytorch/rl/blob/main/sota-implementations/dqn)
- [DDPG](https://github.com/pytorch/rl/blob/main/sota-implementations/ddpg/ddpg.py)
- [IQL](https://github.com/pytorch/rl/blob/main/sota-implementations/iql/iql_offline.py)
- [CQL](https://github.com/pytorch/rl/blob/main/sota-implementations/cql/cql_offline.py)
- [TD3](https://github.com/pytorch/rl/blob/main/sota-implementations/td3/td3.py)
- [TD3+BC](https://github.com/pytorch/rl/blob/main/sota-implementations/td3+bc/td3+bc.py)
- [A2C](https://github.com/pytorch/rl/blob/main/examples/a2c_old/a2c.py)
- [PPO](https://github.com/pytorch/rl/blob/main/sota-implementations/ppo/ppo.py)
- [SAC](https://github.com/pytorch/rl/blob/main/sota-implementations/sac/sac.py)
- [REDQ](https://github.com/pytorch/rl/blob/main/sota-implementations/redq/redq.py)
- [Dreamer](https://github.com/pytorch/rl/blob/main/sota-implementations/dreamer/dreamer.py)
- [Decision Transformers](https://github.com/pytorch/rl/blob/main/sota-implementations/decision_transformer)
- [RLHF](https://github.com/pytorch/rl/blob/main/examples/rlhf)

<table>
<tr>
<td><strong>Algorithm</strong>
</td>
<td><strong>Compile Support**</strong>
</td>
<td><strong>Tensordict-free API</strong>
</td>
<td><strong>Modular Losses</strong>
</td>
<td><strong>Continuous and Discrete</strong>
</td>
</tr>
<tr>
<td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/dqn">DQN</a>
</td>
<td> 1.53x
</td>
<td> +
</td>
<td> NA
</td>
<td> + (through <a href="https://pytorch.org/rl/stable/reference/generated/torchrl.envs.transforms.ActionDiscretizer.html?highlight=actiondiscretizer">ActionDiscretizer</a> transform)
</td>
</tr>
<tr>
<td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/ddpg/ddpg.py">DDPG</a>
</td>
<td> 1.54x
</td>
<td> +
</td>
<td> +
</td>
<td> - (continuous only)
</td>
</tr>
<tr>
<td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/iql/">IQL</a>
</td>
<td> 2.55x
</td>
<td> +
</td>
<td> +
</td>
<td> +
</td>
</tr>
<tr>
<td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/cql/cql_offline.py">CQL</a>
</td>
<td> 1.91x
</td>
<td> +
</td>
<td> +
</td>
<td> +
</td>
</tr>
<tr>
<td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/td3/td3.py">TD3</a>
</td>
<td> 1.79x
</td>
<td> +
</td>
<td> +
</td>
<td> - (continuous only)
</td>
</tr>
<tr>
<td>
<a href="https://github.com/pytorch/rl/blob/main/sota-implementations/td3_bc/td3_bc.py">TD3+BC</a>
</td>
<td> untested
</td>
<td> +
</td>
<td> +
</td>
<td> - (continuous only)
</td>
</tr>
<tr>
<td>
<a href="https://github.com/pytorch/rl/blob/main/examples/a2c/">A2C</a>
</td>
<td> 1.76x
</td>
<td> +
</td>
<td> -
</td>
<td> +
</td>
</tr>
<tr>
<td>
<a href="https://github.com/pytorch/rl/blob/main/sota-implementations/ppo/">PPO</a>
</td>
<td> 2.67x
</td>
<td> +
</td>
<td> -
</td>
<td> +
</td>
</tr>
<tr>
<td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/sac/sac.py">SAC</a>
</td>
<td> 2.01x
</td>
<td> +
</td>
<td> -
</td>
<td> +
</td>
</tr>
<tr>
<td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/redq/redq.py">REDQ</a>
</td>
<td> 2.35x
</td>
<td> +
</td>
<td> -
</td>
<td> - (continuous only)
</td>
</tr>
<tr>
<td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/dreamer/dreamer.py">Dreamer v1</a>
</td>
<td> untested
</td>
<td> +
</td>
<td> + (<a href="https://pytorch.org/rl/stable/reference/objectives.html#dreamer">different classes</a>)
</td>
<td> - (continuous only)
</td>
</tr>
<tr>
<td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/decision_transformer">Decision Transformers</a>
</td>
<td> untested
</td>
<td> +
</td>
<td> NA
</td>
<td> - (continuous only)
</td>
</tr>
<tr>
<td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/crossq">CrossQ</a>
</td>
<td> untested
</td>
<td> +
</td>
<td> +
</td>
<td> - (continuous only)
</td>
</tr>
<tr>
<td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/gail">Gail</a>
</td>
<td> untested
</td>
<td> +
</td>
<td> NA
</td>
<td> +
</td>
</tr>
<tr>
<td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/impala">Impala</a>
</td>
<td> untested
</td>
<td> +
</td>
<td> -
</td>
<td> +
</td>
</tr>
<tr>
<td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/multiagent/iql.py">IQL (MARL)</a>
</td>
<td> untested
</td>
<td> +
</td>
<td> +
</td>
<td> +
</td>
</tr>
<tr>
<td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/multiagent/maddpg_iddpg.py">DDPG (MARL)</a>
</td>
<td> untested
</td>
<td> +
</td>
<td> +
</td>
<td> - (continuous only)
</td>
</tr>
<tr>
<td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/multiagent/mappo_ippo.py">PPO (MARL)</a>
</td>
<td> untested
</td>
<td> +
</td>
<td> -
</td>
<td> +
</td>
</tr>
<tr>
<td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/multiagent/qmix_vdn.py">QMIX-VDN (MARL)</a>
</td>
<td> untested
</td>
<td> +
</td>
<td> NA
</td>
<td> +
</td>
</tr>
<tr>
<td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/multiagent/sac.py">SAC (MARL)</a>
</td>
<td> untested
</td>
<td> +
</td>
<td> -
</td>
<td> +
</td>
</tr>
<tr>
<td><a href="https://github.com/pytorch/rl/blob/main/examples/rlhf">RLHF</a>
</td>
<td> NA
</td>
<td> +
</td>
<td> NA
</td>
<td> NA
</td>
</tr>
</table>

** The number indicates expected speed-up compared to eager mode when executed on CPU. Numbers may vary depending on
architecture and device

and many more to come!

Expand Down

0 comments on commit b67d893

Please sign in to comment.