chapter nine

9 Multi-Agent Reinforcement Learning

In this chapter we learn

Why ordinary Q-learning can fail in the multi-agent setting
How to deal with the “curse of dimensionality” with multiple agents
How to implement multi-agent Q-learning models that can perceive other agents
How to scale multi-agent Q-learning by using the mean field approximation
Use DQNs to control dozens of agents in a multi-agent physics simulation and game

9.1 From one to many agents

So far the reinforcement learning algorithms we have covered: Q-learning, policy gradients, and actor-critic algorithms, were all applied to the case where we are controlling a single agent in an environment. But what about situations where we want to control more than one agent that can interact with each other? The simplest example of this would be a two-player game where each player is implemented as a reinforcement learning agent. But there are other situations in which we may want to model hundreds or thousands of individual agents all interacting with each other, such as a traffic simulation. In this chapter we will learn how to adapt what we’ve learned so far into the multi-agent scenario by implementing an algorithm called Mean Field Q-learning (MF-Q), first described in a paper called “Mean Field Multi-Agent Reinforcement Learning” by Yang et al 2018.

9 Multi-Agent Reinforcement Learning

In this chapter we learn

9.1 From one to many agents

9.2 Neighborhood Q-learning

9.3 The 1-Dimensional Ising Model

9.4 Mean Field Q-Learning and the 2D Ising Model

9.5 Mixed Cooperative-Competitive Games

9.6 Summary