10 Sample-efficient value-based methods

 

In this chapter

  • You will implement a deep neural network architecture that exploits some of the nuances that exist in value-based deep reinforcement learning methods.
  • You will create a replay buffer that prioritizes experiences by how surprising they are.
  • You will build an agent that trains to a near-optimal policy in fewer episodes than all the value-based deep reinforcement learning agents we’ve discussed.

Intelligence is based on how efficient a species became at doing the things they need to survive.

— Charles Darwin English naturalist, geologist, and biologist Best known for his contributions to the science of evolution

In the previous chapter, we improved on NFQ with the implementation of DQN and DDQN. In this chapter, we continue on this line of improvements to previous algorithms by presenting two additional techniques for improving value-based deep reinforcement learning methods. This time, though, the improvements aren’t so much about stability, although that could easily be a by-product. But more accurately, the techniques presented in this chapter improve the sample-efficiency of DQN and other value-based DRL methods.

Dueling DDQN: A reinforcement-learning-aware neural network architecture

 
 
 

Reinforcement learning isn’t a supervised learning problem

 
 
 

Nuances of value-based deep reinforcement learning methods

 
 
 
 

Advantage of using advantages

 
 

A reinforcement-learning-aware architecture

 
 
 

Building a dueling network

 
 

Reconstructing the action-value function

 
 
 

Continuously updating the target network

 
 

What does the dueling network bring to the table?

 
 
 
 

PER: Prioritizing the replay of meaningful experiences

 
 

A smarter way to replay experiences

 
 
 

Then, what’s a good measure of “important” experiences?

 
 

Greedy prioritization by TD error

 
 

Sampling prioritized experiences stochastically

 
 

Proportional prioritization

 
 

Rank-based prioritization

 
 
 
 

Prioritization bias

 

Summary

 
 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage