Chapter 11. Reinforcement learning with value methods

 

This chapter covers

  • Making a self-improving game AI with the Q-learning algorithm
  • Defining and training multi-input neural networks in Keras
  • Building and training a Q-learning agent by using Keras

Have you ever read an expert commentary on a high-level chess or Go tournament game? You’ll often see comments like, “Black is far behind at this point” or “The result up to here is slightly better for white.” What does it mean to be “ahead” or “behind” in the middle of such a strategy game? This isn’t basketball, with a running score to refer to. Instead, the commentator means that the board position is favorable to one player or the other. If you want to be precise, you could define it with a thought experiment. Find a hundred evenly matched pairs of players. Give each pair the board position from the middle of the game, and tell them to start playing from there. If the player taking black wins a small majority of the games—say, 55 out of 100—you can say the position was slightly good for black.

Of course, the commentators are doing no such thing. Instead, they’re relying on their own intuition, built up over thousands of games, to make a judgment on what might happen. In this chapter, we show how to train a computer game player to make similar judgments. And the computer will learn to do it in much the same way a human will: by playing many, many games.

11.1. Playing games with Q-learning

11.2. Q-learning with Keras

11.3. Summary

sitemap