Soft Actor-Critic (SAC) is an off-policy algorithm based on the Maximum Entropy Reinforcement Learning framework. The main idea behind Maximum Entropy RL is to frame the decision-making problem as a graphical model from top to bottom and then solve it using tools borrowed from the field of Probabilistic Graphical Model Under this framework, a learning agent seeks to maximize both the return and the entropy simultaneously. This approach benefit Deep Reinforcement Learning algorithm by giving them the capacity to consider and learn many alternate paths leading to an optimal goal and the capacity to learn how to act optimally despite adverse circumstances.
Since SAC is an off-policy algorithm, then it has the ability to train on samples coming from a different policy. What is particular though is that contrary to other off-policy algortihm, it's stable. This mean that the algorithm is much less picky in term of hyperparameter tuning.
SAC is curently the state of the art Deep Reinforcement Learning algorithm together with Twin Delayed Deep Deterministic policy gradient (TD3)
The learning curve of the Maximum Entropy RL framework is quite steep due to it's depth and to how much it re-think the RL problem. It was definitavely required in order to understand how SAC work. Tackling the applied part was arguably the most difficult project I did to date, both in term of component to implement and silent bug dificulties. Never the less I'm particularly proud of the result.
See my blog post Soft Actor-Critic part 1: intuition and theoretical aspect for more details on SAC and MaxEnt-RL
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
- Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review
- Soft Actor-Critic Algorithms and Applications
- Reinforcement Learning with Deep Energy-Based Policies
- Deterministic Policy Gradient Algorithms
- Reinforcement learning: An introduction
I've also complemented my reading with the following resources:
- CS 294--112 Deep Reinforcement Learning: lectures 14-15 by Sergey Levine from University Berkeley
- OpenAI: Spinning Up: Soft Actor-Critic by Josh Achiam;
- and also Lil' Log blog:Policy Gradient Algorithms by Lilian Weng, research intern at OpenAI
Download the essay pdf: Deep Reinforcement Learning – Soft Actor-Critic
Note: You can check explanation on how to use the package by using the --help
flag
cd DRLimplementation
python -m SoftActorCritic [--playLunar, --playHardLunar, --playPendulum] [--record]
[--play_for]=max trajectories (default=10) [--harderEnvCoeficient=1.6] (default)
cd DRLimplementation
python -m SoftActorCritic < trainExperimentSpecification > [--rerun] [--renderTraining]
Choose < trainExperimentSpecification >
between the following:
- For BipedalWalker-v2 environment:
[--trainBipedalWalker]
: Train on Bipedal Walker gym env a Soft Actor-Critic agent - For Pendulum-v0 environment:
[--trainPendulum]
: Train on Pendulum gym env a Soft Actor-Critic agent - For LunarLanderContinuous-v2 environment:
[--trainLunarLander]
: Train on LunarLander a Soft Actor-Critic agent - Experimentation utility:
[--trainExperimentBuffer]
: Run a batch of experiment spec
cd DRLimplementation
tensorboard --logdir=SoftActorCritic/graph