11 Basics of Deep Reinforcement Learning · Deep Learning with JavaScript: Neural networks in TensorFlow.js

chapter eleven

How reinforcement learning (RL) differs from supervised learning visited in the previous chapters

The basic paradigm of reinforcement learning: agent, environment, action, and reward, and the interactions between them

The general ideas behind two major approaches to solving RL problems: policy-based and value-based methods

Policy-based RL algorithm through example: using the policy gradients (PG) method to solve the cart-pole problem

Q value-based RL algorithm through example: using a deep Q-network (DQN) to solve the snake game.

11.1 The formulation reinforcement-learning problems

11.2 Policy networks and policy gradients: The cart-pole example

11.2.1 Cart-pole as a reinforcement-learning problem

11.2.2 Policy network

11.2.3 Training the policy network: The REINFORCE algorithm

11.3 Value networks and Q-learning: The snake game example

11.3.1 Snake as a reinforcement-learning problem

11.3.2 Markov decision process and Q-values

11.3.3 Deep Q-Network

11.3.4 Training the deep Q-network

11.4 Summary

11.5 Materials for further reading

11.6 Exercises

@font-face { font-family: 'livebook'; src:url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.eot?1.9.0'); src:url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.eot?1.9.0') format('embedded-opentype'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.woff?1.9.0') format('woff'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.ttf?1.9.0') format('truetype'), url('https://d19npu3b8zepp3.cloudfront.net/assets/fonts/livebook.svg?1.9.0') format('svg'); font-weight: normal; font-style: normal; }