catalog books video projects audio free content register pBook

9 More stable value-based methods

In this chapter

You will improve on the methods you learned in the previous chapter by making them more stable and therefore less prone to divergence.
You will explore advanced value-based deep reinforcement learning methods, and the many components that make value-based methods better.
You will solve the cart-pole environment in a fewer number of samples, and with more reliable and consistent results.

Let thy step be slow and steady, that thou stumble not.

— Tokugawa Ieyasu Founder and first shōgun of the Tokugawa shogunate of Japan and one of the three unifiers of Japan

DQN: Making reinforcement learning more like supervised learning

Common problems in value-based deep reinforcement learning

Using target networks

Using larger networks

Using experience replay

Using other exploration strategies

Double DQN: Mitigating the overestimation of action-value functions

The problem of overestimation, take two

Separating action selection from action evaluation

A solution

A more practical solution

A more forgiving loss function

Things we can still improve on

Summary

sitemap

@font-face { font-family: 'livebook'; src:url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.eot?1.9.0'); src:url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.eot?1.9.0') format('embedded-opentype'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.woff?1.9.0') format('woff'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.ttf?1.9.0') format('truetype'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.svg?1.9.0') format('svg'); font-weight: normal; font-style: normal; }