10 Basics of Deep Reinforcement Learning

 

This chapter covers:

  • How reinforcement learning (RL) differs from supervised learning visited in the previous chapters
  • The basic paradigm of reinforcement learning: agent, environment, action, and reward, and the interactions between them
  • The general ideas behind two major approaches to solving RL problems: policy-based and value-based methods
  • Policy-based RL algorithm through example: using the policy gradients (PG) method to solve the cart-pole problem
  • Q value-based RL algorithm through example: using a deep Q-network (DQN) to solve the snake game.

Up to this point in this book, we have focused primarily on a type of machine learning called supervised learning. In supervised learning, we train a model to give us the correct answer given an input. Whether it’s assigning a class label to an input image (Chapter 4) or predicting future temperature based on past weather data (Chapter 8), the paradigm is the same: mapping a static input to a static output. The sequence-generating models we visited in Chapters 8 and 9 were slightly more complicated in that the output is a sequence of items, instead of a single one. But those problems can still be reduced to one-input-one-output mapping by breaking the sequences into steps.

10.1  The formulation reinforcement-learning problems

10.2  Policy networks and policy gradients: The cart-pole example

10.2.1  Cart-pole as a reinforcement-learning problem

10.2.2  Policy network

10.2.3  Training the policy network: The REINFORCE algorithm

10.3  Value networks and Q-learning: The snake game example

10.3.1  Snake as a reinforcement-learning problem

10.3.2  Markov decision process and Q-values

10.3.3  Deep Q-Network

10.3.4  Training the deep Q-network

10.4  Summary

10.5  Materials for further reading

10.6  Exercises