Chapter 8. Curiosity-driven exploration

 

This chapter covers

  • Understanding the sparse reward problem
  • Understanding how curiosity can serve as an intrinsic reward
  • Playing Super Mario Bros. from OpenAI Gym
  • Implementing an intrinsic curiosity module in PyTorch
  • Training a deep Q-network agent to successfully play Super Mario Bros. without using rewards

The fundamental reinforcement learning algorithms we have studied so far, such as deep Q-learning and policy gradient methods are very powerful techniques in a lot of situations, but they fail dramatically in other environments. Google’s DeepMind pioneered the field of deep reinforcement learning back in 2013 when they used deep Q-learning to train an agent to play multiple Atari games at superhuman performance levels. But the performance of the agent was highly variable across different types of games. At one extreme, their DQN agent played the Atari game Breakout vastly better than a human, but at the other extreme the DQN was much worse than a human at playing Montezuma’s Revenge (figure 8.1), where it could not even pass the first level.

Figure 8.1. Screenshot from the Montezuma’s Revenge Atari game. The player must navigate through obstacles to get a key before any rewards are received.

8.1. Tackling sparse rewards with predictive coding

8.2. Inverse dynamics prediction

8.3. Setting up Super Mario Bros.

8.4. Preprocessing and the Q-network

8.5. Setting up the Q-network and policy function

8.6. Intrinsic curiosity module

8.7. Alternative intrinsic reward mechanisms

Summary

sitemap