Chapter 14. AlphaGo Zero: Integrating tree search with reinforcement learning

 

This chapter covers

  • Playing games with a variation on Monte Carlo tree search
  • Integrating tree search into self-play for reinforcement learning
  • Training a neural network to enhance a tree-search algorithm

After DeepMind revealed the second edition of AlphaGo, code-named Master, Go fans all over the world scrutinized its shocking style of play. Master’s games were full of surprising new moves. Although Master was bootstrapped from human games, it was continuously enhanced with reinforcement learning, and that enabled it to discover new moves that humans didn’t play.

This led to an obvious question: what if AlphaGo didn’t rely on human games at all, but instead learned entirely using reinforcement learning? Could it still reach a superhuman level, or would it get stuck playing with beginners? Would it rediscover patterns played by human masters, or would it play in an incomprehensible new alien style? All these questions were answered when AlphaGo Zero (AGZ) was announced in 2017.

AlphaGo Zero was built on an improved reinforcement-learning system, and it trained itself from scratch without any input from human games. Although its first games were worse than any human beginner’s, AGZ improved steadily and quickly surpassed every previous edition of AlphaGo.

14.1. Building a neural network for tree search

14.2. Guiding tree search with a neural network

14.3. Training

14.4. Improving exploration with Dirichlet noise

14.5. Modern techniques for deeper neural networks

14.6. Exploring additional resources

14.7. Wrapping up

14.8. Summary

sitemap