chapter nine

9 Multi-Agent Reinforcement Learning

 

In this chapter we learn

  • Why ordinary Q-learning can fail in the multi-agent setting
  • How to deal with the “curse of dimensionality” with multiple agents
  • How to implement multi-agent Q-learning models that can perceive other agents
  • How to scale multi-agent Q-learning by using the mean field approximation
  • Use DQNs to control dozens of agents in a multi-agent physics simulation and game

9.1   From one to many agents

So far the reinforcement learning algorithms we have covered: Q-learning, policy gradients, and actor-critic algorithms, were all applied to the case where we are controlling a single agent in an environment. But what about situations where we want to control more than one agent that can interact with each other? The simplest example of this would be a two-player game where each player is implemented as a reinforcement learning agent. But there are other situations in which we may want to model hundreds or thousands of individual agents all interacting with each other, such as a traffic simulation. In this chapter we will learn how to adapt what we’ve learned so far into the multi-agent scenario by implementing an algorithm called Mean Field Q-learning (MF-Q), first described in a paper called “Mean Field Multi-Agent Reinforcement Learning” by Yang et al 2018.

9.2   Neighborhood Q-learning

9.3   The 1-Dimensional Ising Model

9.4   Mean Field Q-Learning and the 2D Ising Model

9.5   Mixed Cooperative-Competitive Games

9.6   Summary