chapter seven

7 Distributional DQN: Getting the full story

In this chapter we learn…

Why knowing a full probability distribution is better than a single number
How to extend ordinary Deep Q-networks to output full probability distributions over Q-values
How to implement a distributional variant of DQN to play Atari Freeway
To understand the ordinary Bellman equation and its distributional variant
How to prioritize experience replay to improve training speed

We introduced Q-learning back in Chapter 3 as a way to learn the value of taking each possible action in a given state, called action-values or Q-values. This allowed us to apply a policy to these action-values and choose actions associated with the highest action-values. In this chapter we will learn to extend Q-learning to not just learn a point-estimate of the action-values, but an entire distribution of action-values for each action, which is called Distributional Q-learning. Distributional Q-learning has been shown to result in dramatically better performance on standard benchmarks, and it also allows for more nuanced decision-making, as we will see. Distributional Q-learning algorithms, combined with some other techniques covered in this book, is currently considered a state-of-the-art advance in reinforcement learning.

7.1 What’s wrong with Q-learning?

7.2 Probability and Statistics Revisited

7.2.1 Priors and Posteriors

7.2.2 Expectation and Variance

7.3 The Bellman Equation (Optional)

7.4 Distributional Q-learning

7.4.1 Representing a probability distribution in Python

7.4.2 Implementing the Dist-DQN

7.5 Comparing Probability Distributions

7.6 Dist-DQN on Simulated Data

7.7 Distributional Q-learning to play Freeway

7.8 Summary