Chapter 10. Reinforcement learning with policy gradients

This chapter covers

Improving game play with policy gradient learning
Implementing policy gradient learning in Keras
Tuning optimizers for policy gradient learning

Chapter 9 showed you how to make a Go-playing program play against itself and save the results in experience data. That’s the first half of reinforcement learning; the next step is to use experience data to improve the agent so that it wins more often. The agent from the previous chapter used a neural network to select which move to play. As a thought experiment, imagine you shift every weight in the network by a random amount. Then the agent will select different moves. Just by luck, some of those new moves will be better than the old ones; others will be worse. On balance, the updated agent might be slightly stronger or weaker than the previous version. Which way it goes is up to chance.

Can you improve on that? This chapter covers a form of policy gradient learning. Policy gradient methods provide a scheme for estimating which direction to shift the weights in order to make the agent better at its task. Instead of randomly shifting each weight, you can analyze the experience data to guess whether it’s better to increase or decrease a particular weight. Randomness still plays a role, but policy gradient learning improves your odds.

Chapter 10. Reinforcement learning with policy gradients

This chapter covers

10.1. How random games can identify good decisions

10.2. Modifying neural network policies with gradient descent

10.3. Tips for training with self-play

10.4. Summary