9 More stable value-based methods
In this chapter:
- You improve on the methods you learned in the previous chapter by making them more stable and therefore less prone to divergence.
- You explore advanced value-based deep reinforcement learning methods, and the many components that make value-based methods better.
- You solve the cart-pole environment in a fewer number
- of samples, and with more reliable and consistent results.
Let thy step be slow and steady, that thou stumble not.
— Tokugawa Ieyasu, Founder and first shōgun of the Tokugawa shogunate of Japan, and one of the three unifiers of Japan
In the last chapter, you learned about value-based deep reinforcement learning. NFQ, the algorithm we developed, is a simple solution to the two most common issues value-based methods face: first, the issue that data in RL is not independent and identically distributed.