chapter nine

9 More stable value-based methods

In this chapter:

You improve on the methods you learned in the previous chapter by making them more stable and therefore less prone to divergence.
You explore advanced value-based deep reinforcement learning methods, and the many components that make value-based methods better.
You solve the cart-pole environment in a fewer number of samples, and with more reliable and consistent results.

Let thy step be slow and steady, that thou stumble not.

— Tokugawa Ieyasu, Founder and first shōgun of the Tokugawa shogunate of Japan and one of the three unifiers of Japan.

9.1 DQN: Making reinforcement learning more like supervised learning

9.1.1 Common problems in value-based deep reinforcement learning

9.1.2 Using target networks

9.1.3 Using larger networks

9.1.4 Using experience replay

9.1.5 Using other exploration strategies

9.2 Double DQN: Mitigating the overestimation of action-value functions

9.2.1 The problem of overestimation, take two

9.2.2 Separating action selection and action evaluation

9.2.3 A solution

9.2.4 A more practical solution

@font-face { font-family: 'livebook'; src:url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.eot?1.9.0'); src:url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.eot?1.9.0') format('embedded-opentype'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.woff?1.9.0') format('woff'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.ttf?1.9.0') format('truetype'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.svg?1.9.0') format('svg'); font-weight: normal; font-style: normal; }