12 Advanced actor-critic methods


In this chapter

  • You will learn about more advanced deep reinforcement learning methods, which are, to this day, the state-of-the-art algorithmic advancements in deep reinforcement learning.
  • You will learn about solving a variety of deep reinforcement learning problems, from problems with continuous action spaces, to problem with high-dimensional action spaces.
  • You will build state-of-the-art actor-critic methods from scratch and open the door to understanding more advanced concepts related to artificial general intelligence.

In the last chapter, you learned about a different, more direct, technique for solving deep reinforcement learning problems. You first were introduced to policy-gradient methods in which agents learn policies by approximating them directly. In pure policy-gradient methods, we don’t use value functions as a proxy for finding policies, and in fact, we don’t use value functions at all. We instead learn stochastic policies directly.

DDPG: Approximating a deterministic policy

DDPG uses many tricks from DQN

Learning a deterministic policy

Exploration with deterministic policies

TD3: State-of-the-art improvements over DDPG

Double learning in DDPG

Smoothing the targets used for policy updates

Delaying updates

SAC: Maximizing the expected return and entropy

Adding the entropy to the Bellman equations

Learning the action-value function

Learning the policy

Automatically tuning the entropy coefficient

PPO: Restricting optimization steps

Using the same actor-critic architecture as A2C

Batching experiences

Clipping the policy updates

Clipping the value function updates

