From b67d8936c806f5ea731ef81088f7940c7e14777f Mon Sep 17 00:00:00 2001 From: Vincent Moens Date: Thu, 12 Sep 2024 11:22:20 +0100 Subject: [PATCH] [Doc] Document losses in README.md ghstack-source-id: b75d4e08349532b001c91ea3ae5f1e796de26ec5 Pull Request resolved: https://github.com/pytorch/rl/pull/2408 --- README.md | 286 +++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 273 insertions(+), 13 deletions(-) diff --git a/README.md b/README.md index 47189b758e0..8234d21bc2f 100644 --- a/README.md +++ b/README.md @@ -523,19 +523,279 @@ If you would like to contribute to new features, check our [call for contributio ## Examples, tutorials and demos A series of [examples](https://github.com/pytorch/rl/blob/main/examples/) are provided with an illustrative purpose: -- [DQN](https://github.com/pytorch/rl/blob/main/sota-implementations/dqn) -- [DDPG](https://github.com/pytorch/rl/blob/main/sota-implementations/ddpg/ddpg.py) -- [IQL](https://github.com/pytorch/rl/blob/main/sota-implementations/iql/iql_offline.py) -- [CQL](https://github.com/pytorch/rl/blob/main/sota-implementations/cql/cql_offline.py) -- [TD3](https://github.com/pytorch/rl/blob/main/sota-implementations/td3/td3.py) -- [TD3+BC](https://github.com/pytorch/rl/blob/main/sota-implementations/td3+bc/td3+bc.py) -- [A2C](https://github.com/pytorch/rl/blob/main/examples/a2c_old/a2c.py) -- [PPO](https://github.com/pytorch/rl/blob/main/sota-implementations/ppo/ppo.py) -- [SAC](https://github.com/pytorch/rl/blob/main/sota-implementations/sac/sac.py) -- [REDQ](https://github.com/pytorch/rl/blob/main/sota-implementations/redq/redq.py) -- [Dreamer](https://github.com/pytorch/rl/blob/main/sota-implementations/dreamer/dreamer.py) -- [Decision Transformers](https://github.com/pytorch/rl/blob/main/sota-implementations/decision_transformer) -- [RLHF](https://github.com/pytorch/rl/blob/main/examples/rlhf) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Algorithm + Compile Support** + Tensordict-free API + Modular Losses + Continuous and Discrete +
DQN + 1.53x + + + NA + + (through ActionDiscretizer transform) +
DDPG + 1.54x + + + + + - (continuous only) +
IQL + 2.55x + + + + + + +
CQL + 1.91x + + + + + + +
TD3 + 1.79x + + + + + - (continuous only) +
+ TD3+BC + untested + + + + + - (continuous only) +
+ A2C + 1.76x + + + - + + +
+ PPO + 2.67x + + + - + + +
SAC + 2.01x + + + - + + +
REDQ + 2.35x + + + - + - (continuous only) +
Dreamer v1 + untested + + + + (different classes) + - (continuous only) +
Decision Transformers + untested + + + NA + - (continuous only) +
CrossQ + untested + + + + + - (continuous only) +
Gail + untested + + + NA + + +
Impala + untested + + + - + + +
IQL (MARL) + untested + + + + + + +
DDPG (MARL) + untested + + + + + - (continuous only) +
PPO (MARL) + untested + + + - + + +
QMIX-VDN (MARL) + untested + + + NA + + +
SAC (MARL) + untested + + + - + + +
RLHF + NA + + + NA + NA +
+ +** The number indicates expected speed-up compared to eager mode when executed on CPU. Numbers may vary depending on + architecture and device and many more to come!