Multi-Agent Reinforcement Learning

MARL is the problem of learning how to make decisions from experience in the presence of multiple other decision making agents.

Standard Reinforcement Learning studies how to build computational agents that can learn how to make decisions from interaction from their environment alone, even without a prior understanding of how that that environment works. This field is closely connected with human and animal learning and uses the idea of rewards obtained implicitely from the environment or explicitely from a trainer.

The field of Multi-Agent Reinforcement Learning has been growing steadily interest and complexity in recent years. This is RL in the more complex situation where there are other agents in the environment to interact with who also impact the reward obtained by our learning agent. Now these other agents could be teammates, oponents or neutral strangers and the learning agent might interact directly or indirectly with them. Game Theory is the study of a particular subset of this problem where the domain is generally well understood and while there may be hidden information which agents may not have access to, the agents do not need to learn about the environment itself in order to take actions.

Our Papers on Multi-Agent Reinforcement Learning

Mean Field MARL

Decentralized Mean Field Games

Sriram Ganapathi Subramanian, Matthew Taylor, Mark Crowley, and Pascal Poupart.

In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-2022). Virtual. Feb, 2022.

Abs arXiv PDF URL Hypoth

Multiagent reinforcement learning algorithms have not been widely adopted in large scale environments with many agents as they often scale poorly with the number of agents. Using mean field theory to aggregate agents has been proposed as a solution to this problem. However, almost all previous methods in this area make a strong assumption of a centralized system where all the agents in the environment learn the same policy and are effectively indistinguishable from each other. In this paper, we relax this assumption about indistinguishable agents and propose a new mean field system known as Decentralized Mean Field Games, where each agent can be quite different from others. All agents learn independent policies in a decentralized fashion, based on their local observations. We define a theoretical solution concept for this system and provide a fixed point guarantee for a Q-learning based algorithm in this system. A practical consequence of our approach is that we can address a ‘chicken-and-egg’ problem in empirical mean field reinforcement learning algorithms. Further, we provide Q-learning and actor-critic algorithms that use the decentralized mean field learning approach and give stronger performances compared to common baselines in this area. In our setting, agents do not need to be clones of each other and learn in a fully decentralized fashion. Hence, for the first time, we show the application of mean field learning methods in fully competitive environments, large-scale continuous action space environments, and other environments with heterogeneous agents. Importantly, we also apply the mean field method in a ride-sharing problem using a real-world dataset. We propose a decentralized solution to this problem, which is more practical than existing centralized training methods.
MARLEmpircal

Investigation of Independent Reinforcement Learning Algorithms in Multi-Agent Environments

Ken Ming Lee, Sriram Ganapathi Subramanian, and Mark Crowley

Frontiers in Artificial Intelligence. Sep, 2022.

Abs PDF URL

Independent reinforcement learning algorithms have no theoretical guarantees for finding the best policy in multi-agent settings. However, in practice, prior works have reported good performance with independent algorithms in some domains and bad performance in others. Moreover, a comprehensive study of the strengths and weaknesses of independent algorithms is lacking in the literature. In this paper, we carry out an empirical comparison of the performance of independent algorithms on seven PettingZoo environments that span the three main categories of multi-agent environments, i.e., cooperative, competitive, and mixed. For the cooperative setting, we show that independent algorithms can perform on par with multi-agent algorithms in fully-observable environments, while adding recurrence improves the learning of independent algorithms in partially-observable environments. In the competitive setting, independent algorithms can perform on par or better than multi-agent algorithms, even in more challenging environments. We also show that agents trained via independent algorithms learn to perform well individually, but fail to learn to cooperate with allies and compete with enemies in mixed environments.
MARLEmpircal

Investigation of Independent Reinforcement Learning Algorithms in Multi-Agent Environments

Ken Ming Lee, Sriram Ganapathi Subramanian, and Mark Crowley

In NeurIPS 2021 Deep Reinforcement Learning Workshop. Dec, 2021.

Abs arXiv

Independent reinforcement learning algorithms have no theoretical guarantees for finding the best policy in multi-agent settings. However, in practice, prior works have reported good performance with independent algorithms in some domains and bad performance in others. Moreover, a comprehensive study of the strengths and weaknesses of independent algorithms is lacking in the literature. In this paper, we carry out an empirical comparison of the performance of independent algorithms on four PettingZoo environments that span the three main categories of multi-agent environments, i.e., cooperative, competitive, and mixed. We show that in fully-observable environments, independent algorithms can perform on par with multi-agent algorithms in cooperative and competitive settings. For the mixed environments, we show that agents trained via independent algorithms learn to perform well individually, but fail to learn to cooperate with allies and compete with enemies. We also show that adding recurrence improves the learning of independent algorithms in cooperative partially observable environments.
PO-MFRL

Partially Observable Mean Field Reinforcement Learning

Sriram Ganapathi Subramanian, Matthew Taylor, Mark Crowley, and Pascal Poupart.

In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS). International Foundation for Autonomous Agents and Multiagent Systems, London, United Kingdom. May, 2021.

Abs arXiv PDF

Traditional multi-agent reinforcement learning algorithms are not scalable to environments with more than a few agents, since these algorithms are exponential in the number of agents. Recent research has introduced successful methods to scale multi-agent reinforcement learning algorithms to many agent scenarios using mean field theory. Previous work in this field assumes that an agent has access to exact cumulative metrics regarding the mean field behaviour of the system, which it can then use to take its actions. In this paper, we relax this assumption and maintain a distribution to model the uncertainty regarding the mean field of the system. We consider two different settings for this problem. In the first setting, only agents in a fixed neighbourhood are visible, while in the second setting, the visibility of agents is determined at random based on distances. For each of these settings, we introduce a Q-learning based algorithm that can learn effectively. We prove that this Q-learning estimate stays very close to the Nash Q-value (under a common set of assumptions) for the first setting. We also empirically show our algorithms outperform multiple baselines in three different games in the MAgents framework, which supports large environments with many agents learning simultaneously to achieve possibly distinct goals.
Deep Multi Agent Reinforcement Learning for Autonomous Driving

Sushrut Bhalla, Sriram Ganapathi Subramanian, and Mark Crowley

In Canadian Conference on Artificial Intelligence. May, 2020.

PDF URL
Learning Multi-Agent Communication with Reinforcement Learning

Sushrut Bhalla, Sriram Ganapathi Subramanian, and Mark Crowley

In Conference on Reinforcement Learning and Decision Making (RLDM-19). Montreal, Canada. 2019.
Training Cooperative Agents for Multi-Agent Reinforcement Learning

Sushrut Bhalla, Sriram Ganapathi Subramanian, and Mark Crowley

In Proc. of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2019). Montreal, Canada. 2019.