chapter ten

10 Interpretable Reinforcement Learning: Attention and Relational Models

In this chapter we learn how to:

Implement a relational reinforcement algorithm using the popular self-attention model
Visualize attention maps in order to better interpret the reasoning of an RL agent
Reason about model invariance and equivariance
Incorporate Double Q-learning to improve stability of training

Hopefully by this point you have come to appreciate just how powerful the combination of deep learning and reinforcement learning is at solving tasks previously thought to be the exclusive domain of humans. Deep learning is a class of powerful learning algorithms that can comprehend and reason through complex patterns and data, and reinforcement learning is the framework we use to solve control problems.

Throughout this book we’ve focused on using games as a laboratory for experimenting with reinforcement learning algorithms as they allow us to assess the ability of these algorithms in a very controlled setting. When we build an RL agent that learns to play a game well, we are generally satisfied our algorithm is working. Of course, reinforcement learning has many more applications outside of playing games, and in some of these other domains, the raw performance of the algorithm using some metric (e.g. the accuracy percentage on some task) is not useful without knowing how the algorithm is making its decision.

10.1 Machine Learning Interpretability with Attention and Relational Biases

10.1.1 Invariance and Equivariance

10.2 Relational Reasoning with Attention

10.2.1 Attention Models

10.2.2 Relational Reasoning

10.2.3 Self-Attention Models

10.3 Implementing Self-Attention for MNIST

10.3.1 Transformed MNIST

10.3.2 The Relational Module

10.3.3 Tensor Contractions and Einstein Notation

10.3.4 Training the Relational Module

10.4 Multi-Head Attention and Relational DQN

10.5 Double Q-learning

10.6 Training and Attention Visualization

10.7 Summary