This chapter covers
- Understanding the sparse reward problem
- Understanding how curiosity can serve as an intrinsic reward
- Playing Super Mario Bros. from OpenAI Gym
- Implementing an intrinsic curiosity module in PyTorch
- Training a deep Q-network agent to successfully play Super Mario Bros. without using rewards
The fundamental reinforcement learning algorithms we have studied so far, such as deep Q-learning and policy gradient methods are very powerful techniques in a lot of situations, but they fail dramatically in other environments. Google’s DeepMind pioneered the field of deep reinforcement learning back in 2013 when they used deep Q-learning to train an agent to play multiple Atari games at superhuman performance levels. But the performance of the agent was highly variable across different types of games. At one extreme, their DQN agent played the Atari game Breakout vastly better than a human, but at the other extreme the DQN was much worse than a human at playing Montezuma’s Revenge (figure 8.1), where it could not even pass the first level.