8 Introduction to value-based deep reinforcement learning

In this chapter

You will understand the inherent challenges of training reinforcement learning agents with non-linear function approximators.
You will create a deep reinforcement learning agent that, when trained from scratch with minimal adjustments to hyperparameters, can solve different kinds of problems.
You will identify the advantages and disadvantages of using value-based methods when solving reinforcement learning problems.

Human behavior flows from three main sources: desire, emotion, and knowledge.

— Plato A philosopher in Classical Greece and founder of the Academy in Athens

The kind of feedback deep reinforcement learning agents use

Deep reinforcement learning agents deal with sequential feedback

But, if it isn’t sequential, what is it?

Deep reinforcement learning agents deal with evaluative feedback

But, if it isn’t evaluative, what is it?

Deep reinforcement learning agents deal with sampled feedback

But, if it isn’t sampled, what is it?

Introduction to function approximation for reinforcement learning

Reinforcement learning problems can have high-dimensional state and action spaces

Reinforcement learning problems can have continuous state and action spaces

There are advantages when using function approximation

NFQ: The first attempt at value-based deep reinforcement learning

First decision point: Selecting a value function to approximate

Second decision point: Selecting a neural network architecture

Third decision point: Selecting what to optimize

Fourth decision point: Selecting the targets for policy evaluation

Fifth decision point: Selecting an exploration strategy

Sixth decision point: Selecting a loss function

Seventh decision point: Selecting an optimization method

Things that could (and do) go wrong

Summary

8 Introduction to value-based deep reinforcement learning

In this chapter

The kind of feedback deep reinforcement learning agents use

Deep reinforcement learning agents deal with sequential feedback

But, if it isn’t sequential, what is it?

Deep reinforcement learning agents deal with evaluative feedback

But, if it isn’t evaluative, what is it?

Deep reinforcement learning agents deal with sampled feedback

But, if it isn’t sampled, what is it?

Introduction to function approximation for reinforcement learning

Reinforcement learning problems can have high-dimensional state and action spaces

Reinforcement learning problems can have continuous state and action spaces

There are advantages when using function approximation

NFQ: The first attempt at value-based deep reinforcement learning

First decision point: Selecting a value function to approximate

Second decision point: Selecting a neural network architecture

Third decision point: Selecting what to optimize

Fourth decision point: Selecting the targets for policy evaluation

Fifth decision point: Selecting an exploration strategy

Sixth decision point: Selecting a loss function

Seventh decision point: Selecting an optimization method

Things that could (and do) go wrong

Summary

Unable to load book!