9 More stable value-based methods

 

In this chapter:

  • You improve on the methods you learned in the previous chapter by making them more stable and therefore less prone to divergence.
  • You explore advanced value-based deep reinforcement learning methods, and the many components that make value-based methods better.
  • You solve the cart-pole environment in a fewer number
  • of samples, and with more reliable and consistent results.

Let thy step be slow and steady, that thou stumble not.

— Tokugawa Ieyasu, Founder and first shōgun of the Tokugawa shogunate of Japan, and one of the three unifiers of Japan

In the last chapter, you learned about value-based deep reinforcement learning. NFQ, the algorithm we developed, is a simple solution to the two most common issues value-based methods face: first, the issue that data in RL is not independent and identically distributed.

9.1   DQN: Making reinforcement learning more like supervised learning

9.1.1   Common problems in value-based deep reinforcement learning

9.1.2   Using target networks

9.1.3   Using larger networks

9.2   Using experience replay

9.2.1   Using other exploration strategies

9.3   Double DQN: Mitigating the overestimation of action-value functions

9.3.1   The problem of overestimation, take two

9.3.2   Separating action selection and action evaluation

9.3.3   A solution

9.3.4   A more practical solution

9.3.5   A more forgiving loss function

9.3.6   Things we can still improve on

9.4   Summary

sitemap