Chapter 14. AlphaGo Zero: Integrating tree search with reinforcement learning
This chapter covers
- Playing games with a variation on Monte Carlo tree search
- Integrating tree search into self-play for reinforcement learning
- Training a neural network to enhance a tree-search algorithm
After DeepMind revealed the second edition of AlphaGo, code-named Master, Go fans all over the world scrutinized its shocking style of play. Master’s games were full of surprising new moves. Although Master was bootstrapped from human games, it was continuously enhanced with reinforcement learning, and that enabled it to discover new moves that humans didn’t play.
This led to an obvious question: what if AlphaGo didn’t rely on human games at all, but instead learned entirely using reinforcement learning? Could it still reach a superhuman level, or would it get stuck playing with beginners? Would it rediscover patterns played by human masters, or would it play in an incomprehensible new alien style? All these questions were answered when AlphaGo Zero (AGZ) was announced in 2017.
AlphaGo Zero was built on an improved reinforcement-learning system, and it trained itself from scratch without any input from human games. Although its first games were worse than any human beginner’s, AGZ improved steadily and quickly surpassed every previous edition of AlphaGo.