Chapter 13. AlphaGo: Bringing it all together

 

This chapter covers

  • Diving into the guiding principles that led Go bots to play at superhuman strength
  • Using tree search, supervised deep learning, and reinforcement learning to build such a bot
  • Implementing your own version of DeepMind’s AlphaGo engine

When DeepMind’s Go bot AlphaGo played move 37 of game 2 against Lee Sedol in 2016, it took the Go world by storm. Commentator Michael Redmond, a professional player with nearly a thousand top-level games under his belt, did a double-take on air; he even briefly removed the stone from the demo board while looking around as if to confirm that AlphaGo made the right move. (“I still don’t really understand the mechanics of it,” Redmond told the American Go E-Journal the next day.) Lee, the world-wide dominant player of the past decade, spent 12 minutes studying the board before responding. Figure 13.1 displays the legendary move.

Figure 13.1. The legendary shoulder hit that AlphaGo played against Lee Sedol in the second game of their series. This move stunned many professional players.

13.1. Training deep neural networks for AlphaGo

13.2. Bootstrapping self-play from policy networks

13.3. Deriving a value network from self-play data

13.4. Better search with policy and value networks

13.5. Practical considerations for training your own AlphaGo

13.6. Summary

sitemap