8 Introduction to value-based deep reinforcement learning

 

In this chapter:

  • You understand the inherent challenges of training reinforcement learning agents with non-linear function approximators.
  • You create a deep reinforcement learning agent that when trained from scratch with minimal adjustments to hyperparameters can solve different kinds of problems.
  • You identify the advantages and disadvantages of using value-based methods when solving reinforcement learning problems.

Human behavior flows from three main sources: desire, emotion, and knowledge.

— Plato, A philosopher in Classical Greece, and Founder of the Academy in Athens

8.1   The kind of feedback deep reinforcement learning agents use

8.1.1   Deep reinforcement learning agents deal with sequential feedback

8.1.2   But, if it is not sequential, what is it?

8.1.3   Deep reinforcement learning agents deal with evaluative feedback

8.1.4   But, if it is not evaluative, what is it?

8.1.5   Deep reinforcement learning agents deal with sampled feedback

8.1.6   But, if it is not sampled, what is it?

8.2   Introduction to function approximation for reinforcement learning

8.2.1   Reinforcement learning problems can have high-dimensional state and action spaces

8.2.2   Reinforcement learning problems can have continuous state and action spaces

8.2.3   There are advantages when using function approximation

8.3   NFQ: The first attempt to value-based deep reinforcement learning

8.3.1   First decision point: Selecting a value function to approximate

8.3.2   Second decision point: Selecting a neural network architecture

8.3.3   Third decision point: Selecting what to optimize

8.3.4   Fourth decision point: Selecting the targets for policy evaluation

8.3.5   Fifth decision point: Selecting an exploration strategy

8.3.6   Sixth decision point: Selecting a loss function

8.3.7   Seventh decision point: Selecting an optimization method

8.3.8   Things that could (and do) go wrong

8.4   Summary

sitemap