8 Introduction to value-based deep reinforcement learning
- You will understand the inherent challenges of training reinforcement learning agents with non-linear function approximators.
- You will create a deep reinforcement learning agent that, when trained from scratch with minimal adjustments to hyperparameters, can solve different kinds of problems.
- You will identify the advantages and disadvantages of using value-based methods when solving reinforcement learning problems.
Human behavior flows from three main sources: desire, emotion, and knowledge.
— Plato A philosopher in Classical Greece and founder of the Academy in Athens
The kind of feedback deep reinforcement learning agents use
Deep reinforcement learning agents deal with sequential feedback
But, if it isn’t sequential, what is it?
Deep reinforcement learning agents deal with evaluative feedback
But, if it isn’t evaluative, what is it?
Deep reinforcement learning agents deal with sampled feedback
But, if it isn’t sampled, what is it?
Introduction to function approximation for reinforcement learning
Reinforcement learning problems can have high-dimensional state and action spaces
Reinforcement learning problems can have continuous state and action spaces
There are advantages when using function approximation
NFQ: The first attempt at value-based deep reinforcement learning
First decision point: Selecting a value function to approximate
Second decision point: Selecting a neural network architecture
Third decision point: Selecting what to optimize
Fourth decision point: Selecting the targets for policy evaluation
Fifth decision point: Selecting an exploration strategy
Sixth decision point: Selecting a loss function
Seventh decision point: Selecting an optimization method
Things that could (and do) go wrong
Summary