How reinforcement learning (RL) differs from supervised learning visited in the previous chapters The basic paradigm of reinforcement learning: agent, environment, action, and reward, and the interactions between them The general ideas behind two major approaches to solving RL problems: policy-based and value-based methods Policy-based RL algorithm through example: using the policy gradients (PG) method to solve the cart-pole problem Q value-based RL algorithm through example: using a deep Q-network (DQN) to solve the snake game.
11.1 The formulation reinforcement-learning problems
11.2 Policy networks and policy gradients: The cart-pole example
11.2.1 Cart-pole as a reinforcement-learning problem
11.2.3 Training the policy network: The REINFORCE algorithm
11.3 Value networks and Q-learning: The snake game example
11.3.1 Snake as a reinforcement-learning problem
11.3.2 Markov decision process and Q-values
11.3.4 Training the deep Q-network
11.5 Materials for further reading