10 Sample-efficient value-based methods
In this chapter
- You will implement a deep neural network architecture that exploits some of the nuances that exist in value-based deep reinforcement learning methods.
- You will create a replay buffer that prioritizes experiences by how surprising they are.
- You will build an agent that trains to a near-optimal policy in fewer episodes than all the value-based deep reinforcement learning agents we’ve discussed.
Intelligence is based on how efficient a species became at doing the things they need to survive.
— Charles Darwin English naturalist, geologist, and biologist Best known for his contributions to the science of evolution
In the previous chapter, we improved on NFQ with the implementation of DQN and DDQN. In this chapter, we continue on this line of improvements to previous algorithms by presenting two additional techniques for improving value-based deep reinforcement learning methods. This time, though, the improvements aren’t so much about stability, although that could easily be a by-product. But more accurately, the techniques presented in this chapter improve the sample-efficiency of DQN and other value-based DRL methods.