Chapter 11. Basics of deep reinforcement learning

This chapter covers

How reinforcement learning differs from the supervised learning discussed in the previous chapters
The basic paradigm of reinforcement learning: agent, environment, action, and reward, and the interactions between them
The general ideas behind two major approaches to solving reinforcement-learning problems: policy-based and value-based methods

Up to this point in this book, we have focused primarily on a type of machine learning called supervised learning. In supervised learning, we train a model to give us the correct answer given an input. Whether it’s assigning a class label to an input image (chapter 4) or predicting future temperature based on past weather data (chapters 8 and 9), the paradigm is the same: mapping a static input to a static output. The sequence-generating models we visited in chapters 9 and 10 were slightly more complicated in that the output is a sequence of items instead of a single item. But those problems can still be reduced to one-input-one-output mapping by breaking the sequences into steps.

11.1. The formulation of reinforcement-learning problems

Chapter 11. Basics of deep reinforcement learning

This chapter covers

11.1. The formulation of reinforcement-learning problems

11.2. Policy networks and policy gradients: The cart-pole example

11.3. Value networks and Q-learning: The snake game example

Materials for further reading

Exercises

Summary