chapter seven

7 Distributional DQN: Getting the full story

 

In this chapter we learn…

  • Why knowing a full probability distribution is better than a single number
  • How to extend ordinary Deep Q-networks to output full probability distributions over Q-values
  • How to implement a distributional variant of DQN to play Atari Freeway
  • To understand the ordinary Bellman equation and its distributional variant
  • How to prioritize experience replay to improve training speed

We introduced Q-learning back in Chapter 3 as a way to learn the value of taking each possible action in a given state, called action-values or Q-values. This allowed us to apply a policy to these action-values and choose actions associated with the highest action-values. In this chapter we will learn to extend Q-learning to not just learn a point-estimate of the action-values, but an entire distribution of action-values for each action, which is called Distributional Q-learning. Distributional Q-learning has been shown to result in dramatically better performance on standard benchmarks, and it also allows for more nuanced decision-making, as we will see. Distributional Q-learning algorithms, combined with some other techniques covered in this book, is currently considered a state-of-the-art advance in reinforcement learning.

7.1   What’s wrong with Q-learning?

7.2   Probability and Statistics Revisited

7.2.1   Priors and Posteriors

7.2.2   Expectation and Variance

7.3   The Bellman Equation (Optional)

7.4   Distributional Q-learning

7.4.1   Representing a probability distribution in Python

7.4.2   Implementing the Dist-DQN

7.5   Comparing Probability Distributions

7.6   Dist-DQN on Simulated Data

7.7   Distributional Q-learning to play Freeway

7.8   Summary