Chapter 14. AlphaGo Zero: Integrating tree search with reinforcement learning

 

This chapter covers

  • Playing games with a variation on Monte Carlo tree search
  • Integrating tree search into self-play for reinforcement learning
  • Training a neural network to enhance a tree-search algorithm

After DeepMind revealed the second edition of AlphaGo, code-named Master, Go fans all over the world scrutinized its shocking style of play. Master’s games were full of surprising new moves. Although Master was bootstrapped from human games, it was continuously enhanced with reinforcement learning, and that enabled it to discover new moves that humans didn’t play.

This led to an obvious question: what if AlphaGo didn’t rely on human games at all, but instead learned entirely using reinforcement learning? Could it still reach a superhuman level, or would it get stuck playing with beginners? Would it rediscover patterns played by human masters, or would it play in an incomprehensible new alien style? All these questions were answered when AlphaGo Zero (AGZ) was announced in 2017.

AlphaGo Zero was built on an improved reinforcement-learning system, and it trained itself from scratch without any input from human games. Although its first games were worse than any human beginner’s, AGZ improved steadily and quickly surpassed every previous edition of AlphaGo.

14.1. Building a neural network for tree search

 
 
 
 

14.2. Guiding tree search with a neural network

 
 
 

14.3. Training

 
 
 

14.4. Improving exploration with Dirichlet noise

 
 
 

14.5. Modern techniques for deeper neural networks

 
 
 

14.6. Exploring additional resources

 
 
 
 

14.7. Wrapping up

 
 

14.8. Summary

 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest