chapter six

6 Stabilizing value-based deep reinforcement learning method

In this chapter:

You'll improve on the methods you learned in the previous chapter by making them more stable and therefore less prone to divergence.
You'll explore advanced value-based deep reinforcement learning methods, and the many components that make value-based methods better.
You'll implement more complex exploration strategies and flexible loss functions with function approximation.
You'll solve the cart-pole environment in a fewer number of samples, and with more reliable and consistent results.

"There are times I am happy. There are times I am sad. But I always try to separate emotion from the need to reach for something stronger, deeper. And then no matter the emotion, I can reach for a stability that helps me accomplish what is the goal."

— Troy Polamalu , A former American football strong safety, Samoan descent

6.1 DQN: Making reinforcement learning more like supervised learning

Common problems in value-based deep reinforcement learning

It's important we are clear and understand the two most common problems that consistenly show up in value-based deep reinforcement learning.

6 Stabilizing value-based deep reinforcement learning method

In this chapter:

6.1 DQN: Making reinforcement learning more like supervised learning

6.2 Double DQN: Mitigating the overestimation of approximate action-value functions

6.3 Summary