Chapter 8. Curiosity-driven exploration

This chapter covers

Understanding the sparse reward problem
Understanding how curiosity can serve as an intrinsic reward
Playing Super Mario Bros. from OpenAI Gym
Implementing an intrinsic curiosity module in PyTorch
Training a deep Q-network agent to successfully play Super Mario Bros. without using rewards

The fundamental reinforcement learning algorithms we have studied so far, such as deep Q-learning and policy gradient methods are very powerful techniques in a lot of situations, but they fail dramatically in other environments. Google’s DeepMind pioneered the field of deep reinforcement learning back in 2013 when they used deep Q-learning to train an agent to play multiple Atari games at superhuman performance levels. But the performance of the agent was highly variable across different types of games. At one extreme, their DQN agent played the Atari game Breakout vastly better than a human, but at the other extreme the DQN was much worse than a human at playing Montezuma’s Revenge (figure 8.1), where it could not even pass the first level.

Figure 8.1. Screenshot from the Montezuma’s Revenge Atari game. The player must navigate through obstacles to get a key before any rewards are received.

8.1. Tackling sparse rewards with predictive coding

Chapter 8. Curiosity-driven exploration

This chapter covers

Figure 8.1. Screenshot from the Montezuma’s Revenge Atari game. The player must navigate through obstacles to get a key before any rewards are received.

8.1. Tackling sparse rewards with predictive coding

8.2. Inverse dynamics prediction

8.3. Setting up Super Mario Bros.

8.4. Preprocessing and the Q-network

8.5. Setting up the Q-network and policy function

8.6. Intrinsic curiosity module

8.7. Alternative intrinsic reward mechanisms

Summary