9 More stable value-based methods

 

In this chapter

  • You will improve on the methods you learned in the previous chapter by making them more stable and therefore less prone to divergence.
  • You will explore advanced value-based deep reinforcement learning methods, and the many components that make value-based methods better.
  • You will solve the cart-pole environment in a fewer number of samples, and with more reliable and consistent results.

Let thy step be slow and steady, that thou stumble not.

— Tokugawa Ieyasu Founder and first shōgun of the Tokugawa shogunate of Japan and one of the three unifiers of Japan

DQN: Making reinforcement learning more like supervised learning

 
 

Common problems in value-based deep reinforcement learning

 

Using target networks

 
 
 
 

Using larger networks

 
 
 
 

Using experience replay

 
 

Using other exploration strategies

 

Double DQN: Mitigating the overestimation of action-value functions

 
 

The problem of overestimation, take two

 
 
 
 

Separating action selection from action evaluation

 
 
 

A solution

 
 
 

A more practical solution

 
 

A more forgiving loss function

 
 
 

Things we can still improve on

 
 
 

Summary

 
 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage