chapter eleven
  • How reinforcement learning (RL) differs from supervised learning visited in the previous chapters
  • The basic paradigm of reinforcement learning: agent, environment, action, and reward, and the interactions between them
  • The general ideas behind two major approaches to solving RL problems: policy-based and value-based methods
  • Policy-based RL algorithm through example: using the policy gradients (PG) method to solve the cart-pole problem
  • Q value-based RL algorithm through example: using a deep Q-network (DQN) to solve the snake game.
  • 11.1  The formulation reinforcement-learning problems

    11.2  Policy networks and policy gradients: The cart-pole example

    11.2.1  Cart-pole as a reinforcement-learning problem

    11.2.2  Policy network

    11.2.3  Training the policy network: The REINFORCE algorithm

    11.3  Value networks and Q-learning: The snake game example

    11.3.1  Snake as a reinforcement-learning problem

    11.3.2  Markov decision process and Q-values

    11.3.3  Deep Q-Network

    11.3.4  Training the deep Q-network

    11.4  Summary

    11.5  Materials for further reading

    11.6  Exercises