Chapter 7. Distributional DQN: Getting the full story

This chapter covers

Why a full probability distribution is better than a single number
Extending ordinary deep Q-networks to output full probability distributions over Q values
Implementing a distributional variant of DQN to play Atari Freeway
Understanding the ordinary Bellman equation and its distributional variant
Prioritizing experience replay to improve training speed

We introduced Q-learning in chapter 3 as a way to determine the value of taking each possible action in a given state; the values were called action values or Q values. This allowed us to apply a policy to these action values and to choose actions associated with the highest action values. In this chapter we will extend Q-learning to not just determine a point estimate for the action values, but an entire distribution of action values for each action; this is called distributional Q-learning. Distributional Q-learning has been shown to result in dramatically better performance on standard benchmarks, and it also allows for more nuanced decision-making, as you will see. Distributional Q-learning algorithms, combined with some other techniques covered in this book, are currently considered a state-of-the-art advance in reinforcement learning.

7.1. What’s wrong with Q-learning?

7.2. Probability and statistics revisited

7.3. The Bellman equation

7.4. Distributional Q-learning

7.5. Comparing probability distributions

7.6. Dist-DQN on simulated data

7.7. Using distributional Q-learning to play Freeway

Summary