Some great publications accepted in past four months.

Lab News for March 2022

March 7, 2022 • publications

Members of the UWECEML lab have had a good couple months, with a few notable papers accepted to great venues.

Publications

AAAI Paper on Decentralized Multi-Agent Reinforcement Learning

At this year’s AAAI Conference on Artificial Intelligence in (location) Sriram Ganapathi Subramanian presented the paper:

Ganapathi Subramanian, S., Taylor, M., Crowley, M., & Poupart, P. (2022). Decentralized Mean Field Games. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-2022), 36(9), 9439–9447. https://doi.org/https://doi.org/10.1609/aaai.v36i9.21176

You can read more about the research from this post about the paper by one of the other co-authors Prof. Matt Taylor at the University of Alberta.

Canadian AI 2022

Bellinger, C., Drozdyuk, A., Crowley, M., & Tamblyn, I. (2022). Balancing Information with Observation Costs in Deep Reinforcement Learning. Canadian Conference on Artificial Intelligence, 12. https://caiac.pubpub.org/pub/0jmy7gpd

This paper (Bellinger et al., 2022) is called “Balancing Information with Observation Costs in Deep Reinforcement Learning” and it builds on other work (Beeler et al., 2022), (Bellinger et al., 2022), (Bellinger et al., 2021) related to digital chemistry and material design where we attempt to use Reinforcement Learning to come up with better pathways for materials. This is a collaboration with the NRC. You can see the project page for ChemGymRL for more information.

Neurips Workshop

Lee, K. M., Ganapathi Subramanian, S., & Crowley, M. (2021). Investigation of Independent Reinforcement Learning Algorithms in Multi-Agent Environments. NeurIPS 2021 Deep Reinforcement Learning Workshop, 15.

This workshop paper (Lee et al., 2021) and presentation was part of a project for undergraduate student Ken Ming Lee, who has worked in the lab as a URA multiple terms on RL algorithms and software development. The original idea to do an empirical study of RL algorithms in Multiagent setting flowed out of results needed for Sriram’s PhD research. This paper is an empirical comparison of many single and multi-agent algorithms on a range of multi-agent planning domains. A longer version of this work has been submitted to Frontiers in AI journal for consideration (Lee et al., 2022).

References:

SubWorld

Dynamic programming with partial information to overcome navigational uncertainty in a nautical environment

Chris Beeler, Xinkai Li, Mark Crowley, Maia Fraser, and Isaac Tamblyn.

IEEE Intelligent Systems. IEEE, 2022.

Abs

Using a toy nautical navigation environment, we show that dynamic programming can be used when only partial information about a partially observed Markov decision process (POMDP) is known. By incorporating uncertainty into our model, we show that navigation policies can be constructed that maintain safety. Adding controlled sensing methods, we show that these policies can also lower measurement costs at the same time.

MARLEmpircal

Investigation of Independent Reinforcement Learning Algorithms in Multi-Agent Environments

Ken Ming Lee, Sriram Ganapathi Subramanian, and Mark Crowley

Frontiers in Artificial Intelligence. Sep, 2022.

Abs PDF URL

Independent reinforcement learning algorithms have no theoretical guarantees for finding the best policy in multi-agent settings. However, in practice, prior works have reported good performance with independent algorithms in some domains and bad performance in others. Moreover, a comprehensive study of the strengths and weaknesses of independent algorithms is lacking in the literature. In this paper, we carry out an empirical comparison of the performance of independent algorithms on seven PettingZoo environments that span the three main categories of multi-agent environments, i.e., cooperative, competitive, and mixed. For the cooperative setting, we show that independent algorithms can perform on par with multi-agent algorithms in fully-observable environments, while adding recurrence improves the learning of independent algorithms in partially-observable environments. In the competitive setting, independent algorithms can perform on par or better than multi-agent algorithms, even in more challenging environments. We also show that agents trained via independent algorithms learn to perform well individually, but fail to learn to cooperate with allies and compete with enemies in mixed environments.

Balancing Information with Observation Costs in Deep Reinforcement Learning

Colin Bellinger, Andriy Drozdyuk, Mark Crowley, and Isaac Tamblyn.

In Canadian Conference on Artificial Intelligence. Canadian Artificial Intelligence Association (CAIAC), Toronto, Ontario, Canada. May, 2022.

Abs PDF Slides URL Hypoth

The use of Reinforcement Learning (RL) in scientific applications, such as materials design and automated chemistry, is increasing. A major challenge, however, lies in fact that measuring the state of the system is often costly and time consuming in scientific applications, whereas policy learning with RL requires a measurement after each time step. In this work, we make the measurement costs explicit in the form of a costed reward and propose the active-measure with costs framework that enables off-the-shelf deep RL algorithms to learn a policy for both selecting actions and determining whether or not to measure the state of the system at each time step. In this way, the agents learn to balance the need for information with the cost of information. Our results show that when trained under this regime, the Dueling DQN and PPO agents can learn optimal action policies whilst making up to 50% fewer state measurements, and recurrent neural networks can produce a greater than 50% reduction in measurements. We postulate the these reduction can help to lower the barrier to applying RL to real-world scientific applications.

Scientific Discovery and the Cost of Measurement – Balancing Information and Cost in Reinforcement Learning

Colin Bellinger, Andriy Drozdyuk, Mark Crowley, and Isaac Tamblyn.

In 1st Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE). Feb, 2022.

Abs arXiv

The use of reinforcement learning (RL) in scientific applications, such as materials design and automated chemistry, is increasing. A major challenge, however, lies in fact that measuring the state of the system is often costly and time consuming in scientific applications, whereas policy learning with RL requires a measurement after each time step. In this work, we make the measurement costs explicit in the form of a costed reward and propose a framework that enables off-the-shelf deep RL algorithms to learn a policy for both selecting actions and determining whether or not to measure the current state of the system at each time step. In this way, the agents learn to balance the need for information with the cost of information. Our results show that when trained under this regime, the Dueling DQN and PPO agents can learn optimal action policies whilst making up to 50% fewer state measurements, and recurrent neural networks can produce a greater than 50% reduction in measurements. We postulate the these reduction can help to lower the barrier to applying RL to real-world scientific applications.

MARLEmpircal

Investigation of Independent Reinforcement Learning Algorithms in Multi-Agent Environments

Ken Ming Lee, Sriram Ganapathi Subramanian, and Mark Crowley

In NeurIPS 2021 Deep Reinforcement Learning Workshop. Dec, 2021.

Abs arXiv

Independent reinforcement learning algorithms have no theoretical guarantees for finding the best policy in multi-agent settings. However, in practice, prior works have reported good performance with independent algorithms in some domains and bad performance in others. Moreover, a comprehensive study of the strengths and weaknesses of independent algorithms is lacking in the literature. In this paper, we carry out an empirical comparison of the performance of independent algorithms on four PettingZoo environments that span the three main categories of multi-agent environments, i.e., cooperative, competitive, and mixed. We show that in fully-observable environments, independent algorithms can perform on par with multi-agent algorithms in cooperative and competitive settings. For the mixed environments, we show that agents trained via independent algorithms learn to perform well individually, but fail to learn to cooperate with allies and compete with enemies. We also show that adding recurrence improves the learning of independent algorithms in cooperative partially observable environments.

Amrl

Active Measure Reinforcement Learning for Observation Cost Minimization: A framework for minimizing measurement costs in reinforcement learning

Colin Bellinger, Rory Coles, Mark Crowley, and Isaac Tamblyn.

In Canadian Conference on Artificial Intelligence. Springer, 2021.

Abs arXiv PDF

Markov Decision Processes (MDP) with explicit measurement cost are a class of en- vironments in which the agent learns to maximize the costed return. Here, we define the costed return as the discounted sum of rewards minus the sum of the explicit cost of measuring the next state. The RL agent can freely explore the relationship between actions and rewards but is charged each time it measures the next state. Thus, an op- timal agent must learn a policy without making a large number of measurements. We propose the active measure RL framework (Amrl) as a solution to this novel class of problem, and contrast it with standard reinforcement learning under full observability and planning under partially observability. We demonstrate that Amrl-Q agents learn to shift from a reliance on costly measurements to exploiting a learned transition model in order to reduce the number of real-world measurements and achieve a higher costed return. Our results demonstrate the superiority of Amrl-Q over standard RL methods, Q-learning and Dyna-Q, and POMCP for planning under a POMDP in environments with explicit measurement costs.