Chapter 12. Reinforcement learning with actor-critic methods

 

This chapter covers

  • Using advantage to make reinforcement learning more efficient
  • Making a self-improving game AI with the actor-critic method
  • Designing and training multi-output neural networks in Keras

If you’re learning to play Go, one of the best ways to improve is to get a stronger player to review your games. Sometimes the most useful feedback just points out where you won or lost the game. The reviewer might give comments like, “You were already far behind by move 30” or “At move 110, you had a winning position, but your opponent turned it around by move 130.”

Why is this feedback helpful? You may not have time to scrutinize all 300 moves in a game, but you can focus your full attention on a 10- or 20-move sequence. The reviewer lets you know which parts of the game are important.

Reinforcement-learning researchers apply this principle in actor-critic learning, which is a combination of policy learning (as covered in chapter 10) and value learning (as covered in chapter 11). The policy function plays the role of the actor: it picks what moves to play. The value function is the critic: it tracks whether the agent is ahead or behind in the course of the game. That feedback guides the training process, in the same way that a game review can guide your own study.

12.1. Advantage tells you which decisions are important

12.2. Designing a neural network for actor-critic learning

12.3. Playing games with an actor-critic agent

12.4. Training an actor-critic agent from experience data

12.5. Summary

sitemap