Majid Ghasemi only began his doctoral studied this past September 2025, but already he has dove into his research topics on ethical decision making with reinforcement learning with gusto, submitting accepted multiple workshop papers and initial results.

A side project coming out of one of his graduate courses lead to a paper at the recent Canadian AI Conference in Vancouver (Ghasemi & Crowley, 2026). The paper does some analysis and theory providing the first solid stability analysis of a type of scaling limit idea for RL, a work which could a foundation for anyone who is interested in the field.

Ghasemi, M., & Crowley, M. (2026). Scaling Limits of Deep Reinforcement Learning: A Stability Analysis with Maximal Update Parametrization. Canadian Conference on Artificial Intelligence, 6.

Majid also submitted an abstract to a unique biannual interdisciplinary conference on Formal Ethics (Ghasemi & Crowley, 2026). Prof. Crowley will present about in Buffalo in July. The talk will explore new dimensions for how we can see ethics through the lens of reinforcement learning arguing that classic RL and constraint optimization methods have structural limits to the type of ethical reasoning they can perform.

Ghasemi, M., & Crowley, M. (2026, July). Talk: Designing Virtuous Agents via Social Reinforcement Learning. Formal Ethics 2026.

One could argue that even with the use of RLHF to “allign” Large Language Models to shared values, there remain a lot of blindspots in modern LLMs. Some of these failures arise from two types of drift away the best, or at least better, outputs when measured with respect to truth and to ethics. In this workshop paper at ICML (Ghasemi & Crowley, 2026), Majid proposes Social Reinforcement Learning (Social RL) as a way of structurally enforcing feedback integrity. By situating agents in social environments driven by peer critique, reputation, observation, and sanction, Social RL treats alignment as an ongoing negotiation rather than a static specification problem, and offers mechanisms for correcting epistemic errors and stabilizing ethical norms in open-ended environments.

Ghasemi, M., & Crowley, M. (2026). Rethinking AI Alignment: From Static Rewards to Social Reinforcement Learning. Pluralistic Alignment Workshop at ICML 2026. 1. https://openreview.net/forum?id=dm2o7hw14F

In another workshop paper at ICML this year (Ghasemi & Crowley, 2026), Majid explores an old idea that seems to have been forgotten about how to combine rewards, or any scoring metrics, together in a multi-agent system. The main idea being that “consensus” isn’t always agreement amongst all the agents, if there are “tyrant” agents or certain types of peer-influenced collaborations occurring. This has implications for using social interaction to ethical reasoning abilities using RL.

Ghasemi, M., & Crowley, M. (2026). Can Standard MARL Metrics Distinguish Communicative from Strategic Action? ICML 2026 Workshop: Philosophy Meets Machine Learning. 1. https://openreview.net/forum?id=diNaPaat4w

This paper was accepted last term to a workshop at the major international conference AAAI.

Ghasemi, M., & Crowley, M. (2026, January). Toward Virtuous Reinforcement Learning: A Critique and Roadmap. Workshop on Machine Ethics: From Formal Methods To Emergent Machine Ethics at AAAI 2026. 1.

He accomplished a lot in his first two terms, getting two short papers into a national conference and two more into international workshops. All this while showing up an outstanding performance in multiple grad courses, TAing a coures, and now beginning an internship with an industry partner.

And to wrap that up, he even won a Faculty of Engineering Term Award for his accomplisments in Winter 2026.