ProToM: Promoting Prosocial Behaviour via Theory of Mind-Informed Feedback

Abstract

While humans are inherently social creatures, the challenge of identifying when and how to assist and collaborate with others - particularly when pursuing independent goals - can hinder cooperation. To address this challenge, we aim to develop an AI system that provides useful feedback to promote prosocial behaviour - actions that benefit others, even when not directly aligned with one's own goals. We introduce ProToM, a Theory of Mind-informed facilitator that promotes prosocial actions in multi-agent systems by providing targeted, context-sensitive feedback to individual agents. ProToM first infers agents' goals using Bayesian inverse planning, then selects feedback to communicate by maximising expected utility, conditioned on the inferred goal distribution. We evaluate our approach against baselines in two multi-agent environments: Doors, Keys, and Gems, as well as Overcooked. Our results suggest that state-of-the-art large language and reasoning models fall short of communicating feedback that is both contextually grounded and well-timed - leading to higher communication overhead and lower success rates. In contrast, ProToM provides targeted and helpful feedback, achieving a higher success rate, shorter task completion times, and is consistently preferred by human users.

Problem Formulation

We consider an AI facilitator as an omniscient observer that provides feedback to n interacting agents with the goal of promoting prosocial behaviour. In the most general case, this setting can be formalised as a two-level POMDP:

Inner agents play a partially-observable game with their own goals.
The outer assistant plays a POMDP whose hidden component is the observed agents' internal state, and whose only available action is the feedback communication.

Given an observed trajectory the AI facilitator must decide whether to communicate feedback, and if so, which feedback to communicate.

ProToM

ProToM-overview

ProToM actively assists both agents in an environment by observing their actions, inferring their goals, and providing feedback when it is expected to improve prosocial behaviour.

Maintaining Beliefs About Agents

To simulate and interpret the behaviour of agents, ProToM maintains a dynamic belief distribution for each agent, capturing both their internal belief about the environment and their underlying goal. This belief distribution is approximated using a particle filter with N particles per agent, and the agents' goal distribution is computed using Bayesian inverse planning.

Feedback Selection

Given a finite set of candidate feedback messages, ProToM performs feedback selection in three steps.

First, it evaluates each candidate feedback message by computing its expected utility, conditioned on the inferred goals and belief states of the agents.
Second, ProToM decides whether to issue new feedback by comparing the expected utility of the best candidate against a threshold and estimating the divergence between the agent's predicted behaviour with and without the feedback.
Finally, if a feedback message is selected, ProToM generates a corresponding explanation to help the agent understand the reasoning behind the suggestion.

Results

simulation-results

Compared to the baselines, ProToM achieves higher success rates and task speedups, with reduced communication overhead.

human-results

Our human study also showed that ProToM's feedback was perceived as more helpful, appropriate, and better explained and aligned with users' goals.

BibTeX


@article{bortoletto2025protom,
  title={ProToM: Promoting Prosocial Behaviour via Theory of Mind-Informed Feedback},
  author={Bortoletto, Matteo and Zhou, Yichao and Ying, Lance and Shu, Tianmin and Bulling, Andreas},
  journal={arXiv preprint arXiv:2509.05091},
  year={2025}
}