8 Introduction to value-based deep reinforcement learning

 

In this chapter

  • You will understand the inherent challenges of training reinforcement learning agents with non-linear function approximators.
  • You will create a deep reinforcement learning agent that, when trained from scratch with minimal adjustments to hyperparameters, can solve different kinds of problems.
  • You will identify the advantages and disadvantages of using value-based methods when solving reinforcement learning problems.

Human behavior flows from three main sources: desire, emotion, and knowledge.

— Plato A philosopher in Classical Greece and founder of the Academy in Athens

The kind of feedback deep reinforcement learning agents use

 
 
 

Deep reinforcement learning agents deal with sequential feedback

 
 

But, if it isn’t sequential, what is it?

 
 
 

Deep reinforcement learning agents deal with evaluative feedback

 
 
 

But, if it isn’t evaluative, what is it?

 
 

Deep reinforcement learning agents deal with sampled feedback

 
 
 

But, if it isn’t sampled, what is it?

 
 

Introduction to function approximation for reinforcement learning

 
 
 

Reinforcement learning problems can have high-dimensional state and action spaces

 

Reinforcement learning problems can have continuous state and action spaces

 
 
 

There are advantages when using function approximation

 

NFQ: The first attempt at value-based deep reinforcement learning

 
 
 

First decision point: Selecting a value function to approximate

 
 
 

Second decision point: Selecting a neural network architecture

 
 

Third decision point: Selecting what to optimize

 
 

Fourth decision point: Selecting the targets for policy evaluation

 
 
 

Fifth decision point: Selecting an exploration strategy

 
 

Sixth decision point: Selecting a loss function

 

Seventh decision point: Selecting an optimization method

 
 
 

Things that could (and do) go wrong

 
 
 

Summary

 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage