|
Learning to Be Fair: Modeling Fairness Dynamics by Simulating Moral-Based Multi-Agent Resource Allocation |
Soundness: 3: good
Presentation: 3: good
Contribution: 3: good
Rating: 6: marginally above the acceptance threshold
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
This paper draws motivation from the multi-faceted nature of fairness, that is, fairness perception is tied to contexts and moral reasoning and can evolve with social interaction and agent reflection. The authors propose an LLM-based multi-agent simulation framework for studying fairness’ evolutionary dynamics intertwined with agent decisions. The framework combines multi-agent reinforcement learning (MARL) to represent evolving contexts and LLM-based simulation to provide a reflection and feedback mechanism.
Using the framework, the authors implement two games, negotiated fairness and fairness learning, to examine the dynamic of fairness and the development of fairness consensus. Results from the negotiated fairness game show that social feedback shapes fair norms in allocation. Results from the fairness learning game demonstrate that collective social experience shapes fairness consensus.
1. The paper proposes a novel framework for involving cognitive LLM agents in multi-agent environment with structured feedback mechanism.
2. The two games capture different aspects of fairness dynamics and provide nuanced insights. The authors report a variety of results from both games and provide interpretation from the perspective of fairness as an evolving construct.
3. The ablation study provides additional support for the chosen game design.
1. Lack clarity in responses of non-allocator agents: The proposed framework focuses on the perspective of allocator agents, for which the experiments consider four possible moral traits. It is unclear what types of moral reasoning non-allocator agents have, and how they are set up to score proposed allocations. In the negotiated fairness game, the non-allocator agents are differentiated by the contribution distributions: for a given allocation, does an agent score higher if the agent believes its given portion fairly awards the contribution, and lower vice versa? In the fairness learning game, the non-allocator agents have different roles: how does these role-specific perspectives apply in scoring allocation and providing justification? I would expect these non-allocator agents’ views to potentially impact fairness dynamics, for example, if all these agents are all strongly selfish, social interaction and reflection may not lead to fairness norm shift as they would always give low scores to allocations unfavorable to themselves.
2. Unclear practical relevance of learning from pure LLM-based simulation: The paper studies a full LLM-based framework and uses only LLM agents to draw conclusions about fairness dynamics and fairness perception evolution. It is unclear whether and how these findings can apply in practice? A more general question is, what are possible use cases of the proposed framework? If the framework is intended to illustrate the complex dynamics of fairness perception and moral reasoning in human decision process, a key assumption to justify is the validity of using LLM agent behaviors as proxy of human behaviors.
3. Some result interpretations appear inconsistent with figures: In Section 4.2, line 295-296 states that “As shown in Fig. 2 (a), for most moral types (kin, reciprocal, universal), in the final stage, the HP distribution presents a more balanced pattern compared to the first stage.” It is not immediately clear why this statement is true: what metric is used to evaluate the degree of balance?
Also, in Section 4.2, line 306 mentions the decreased emphasis on “Contribution” in thinking text: in Figure 2(b), ‘Contribution’ is only shown as a keyword for ‘Kin-Focused’ and ‘Universal’.
Then line 347-350 compares first and second scores in Figure 3 and states that agents with relatively low first-round scores tend to see an increase in allocation ratio in the second round, and have “second scores (square markers) higher than the first”. Agent 2 (green bars) and agent 5 (red bars) in Figure 3(a) appear to be opposite to this statement.
In Section 4.3, line 369-371 discusses that “From Fig. 4 (a), average scores and total allocated HP rise with experimental cycles”, but some results are not exactly showing this trend, such as orange bars and blue bars in Figure 4(a).
4. Minor: in Figure 3, the differences between deep colors and transparent colors are difficult to read and understand.
Please see my questions listed in weaknesses. In addition, could the authors clarify how the presented experiment results respectively answer the two research questions at the end of page 1 (line 52-53) and under Figure 1 (line 72-73)? |
Fully human-written |
|
Learning to Be Fair: Modeling Fairness Dynamics by Simulating Moral-Based Multi-Agent Resource Allocation |
Soundness: 2: fair
Presentation: 2: fair
Contribution: 2: fair
Rating: 2: reject
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
This paper presents a computational framework designed to model fairness as a dynamic and evolving social construct rather than a static, predefined objective. The authors use multi-agent simulations populated with Large Language Model (LLM)-based "cognitive agents" to investigate how fairness norms can emerge and change through social interaction, negotiation, and feedback.
The agents in the simulation are assigned heterogeneous moral types (such as Self-focused, Kin-focused, Reciprocal, and Universal) to create diverse perspectives. The study's results indicate that social feedback mechanisms—like peer scoring and penalties—successfully steer agent behavior toward more balanced, group-endorsed fairness norms.
- Well-Written Game Setting: The two game scenarios (Negotiated Fairness after a group hunt and Fairness Learning with public resources) are well-designed, clearly explained, and create interesting moral tensions.
- Good Ablation Study: The authors ran useful ablation studies. They showed that removing the iterative negotiation step or the penalty mechanism led to worse fairness outcomes (lower scores, higher variance). This supports the claim that these feedback mechanisms are important for the results.
- First of all, I have concens about the generalizability. The authors claim they are introducing a "computational framework" , but the experiments only use a single model (OpenAI's GPT-4o) and a specific set of agent prompts. How can this be a general framework? The results (e.g., selfish agents becoming more group-focused ) might just be an artifact of this one LLM or the specific prompts. The paper does not show this can generalize to other models or scenarios. Also there is a Missing Connection to Human Fairness -- The paper's introduction is all about fairness as a human social construct. But the paper completely fails to connect the agent simulations back to humans? More specifically, there is no human baseline. How do we know these agents behave like people?
There are no human evaluators. How do we know the agents' final "fair" allocations are actually seen as fair by real people?
Without this connection, the paper cannot support its claims of "captur[ing] key complexities of fairness" or offering "human-like subjective satisfaction".
- My second worry comes from the lack of baselines: the paper has no concrete baselines. The related work mentions classic fairness concepts from economics like inequity aversion and Nash welfare, but the experiments never compare against them. How does this complex LLM agent simulation compare to a simpler behavioral model from economics?
- I am also confused by the innovation in this paper. The "method" is an agent-based simulation that calls an LLM API and uses prompts for reflection. This is not a new learning algorithm.
- And I am not a fan for big claims. The paper claims to capture complex ideas like "ambiguity, procedural justice, and subjective satisfaction". But it's not clear how the experiments show this. The metrics used are just HP allocation amounts, scores from other agents, and keyword frequency in text. This seems too simple to make claims about deep concepts like "procedural justice" or "fairness" in general. The economic literature, for instance, provides precise, testable definitions (e.g., envy-freeness, equity, or inequity aversion), can the authors specify which ones they are referring to?
- What is the human baseline? Why did you choose not to run any human experiments or use human evaluators, especially since your motivation is all about human fairness?
- The simulations run for a limited number of turns (e.g., 10 cycles in the Learning game ). What are your predictions for longer-turn interactions? Do you expect agents to reach a stable consensus, or will they keep changing?
- You start immediately with heterogeneous agents. What happens in a simpler setting with homogeneous agents (e.g., all "Reciprocal")? |
Fully human-written |
|
Learning to Be Fair: Modeling Fairness Dynamics by Simulating Moral-Based Multi-Agent Resource Allocation |
Soundness: 3: good
Presentation: 3: good
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
This paper studies fairness as a dynamic, negotiated construct using a text-based multi-agent simulation with LLM “cognitive agents.” Two scenarios are proposed: (1) Negotiated Fairness after group hunting, where an allocator iteratively proposes distributions and receives peer scoring plus reflection; and (2) Fairness Learning with semi-public resources, where allocators observe historical cases, receive recipient scores, face penalties under a threshold, and reflect before the next round. Agents have moral archetypes (Selfish, Kin, Reciprocal, Universal). The paper reports that (i) low first-round scores correlate with larger allocator concessions, (ii) keyword shifts indicate movement from contribution accounting toward survival and group cooperation, (iii) self-interest ratios for selfish agents decline over cycles under scoring and penalty feedback, and (iv) ablations show iterative negotiation and penalties are important for convergence. The work aims to bridge MARL’s learning-without-justification and LLM agents’ justification-without-learning by adding explicit feedback and reflection while keeping evaluation quantitative.
- Treats fairness as negotiated, contextual, and evolving rather than a fixed metric. This is timely and relevant for LLM-mediated decision settings.
- Combines LLM agent negotiation, explicit feedback, and structured reflection to study norm formation. The two scenarios (negotiation and learning with penalties) are clear and complementary.
- Ablations are well chosen and align with the claims. Removing a negotiation round or removing penalties worsens alignment and stability, which supports the causal story.
- The paper avoids being purely narrative by using concession magnitudes, alignment to contribution-based reference, dispersion across identities, and tracked self-interest ratios. The keyword analyses help explain behavioral shifts.
- Results rely on a single LLM (GPT-4o) in synthetic pre-historic settings with hand-crafted roles. There is no human study, no behavioral replication of classic bargaining results, and no validation against established lab findings beyond qualitative analogies.
- There is no direct comparison to strong MARL or rule-based baselines under identical scenarios with matched feedback channels, nor to recent LLM multi-agent frameworks with learning-style updates (e.g., memory consolidation or lightweight preference fitting). As a result, the incremental advantage of the proposed reflection plus scoring loop is not fully quantified.
- The paper focuses on distributive trends and subjective satisfaction but does not probe tradeoffs among standard ML fairness criteria or causal fairness frames. Even within economic fair division, only a contribution baseline is used; other baselines (e.g., MNW, EF1, proportionality) are not contrasted in outcomes and acceptance. A relevant paper to look at is : Dai, Gordon, and Yunze Xiao. "Embracing Contradiction: Theoretical Inconsistency Will Not Impede the Road of Building Responsible AI Systems." arXiv preprint arXiv:2505.18139 (2025).
- Agents reflect and observe prior cases, but there is no explicit parameter update or policy learning with held-out evaluation. The work claims to link justification and learning, yet the learning is mostly via prompt context and reflection rather than measurable policy change. This weakens the “learning to be fair” position.
1. Even a small human study that rates final allocations or negotiator behavior would improve credibility. Can you show that human raters prefer the negotiated outcomes to ablated ones?
2. What happens if recipients strategically down-score to extract concessions. Any safeguards or equilibrium analysis?
3. How do results change across different LLMs and decoding settings? |
Fully AI-generated |
|
Learning to Be Fair: Modeling Fairness Dynamics by Simulating Moral-Based Multi-Agent Resource Allocation |
Soundness: 2: fair
Presentation: 2: fair
Contribution: 2: fair
Rating: 2: reject
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
This paper presents an LLM-based multi-agent simulation environment which is used to study two games related to fairness in terms of resource allocation tasks. The main contribution is the implementation of the simulation environment powered by LLM and illustrating how it can be used to conduct the two games. While related studies are important and the prompt design of the agents for setting up games is interesting, its contribution to the field of representation learning is not clear to me. Also, it is not clear how representative the two games are and how the simulation can be scaled up.
The problem of using LLM-based multi-agent simulation to study fairness issues in recourse allocation is important and interesting.
The implemented simulation system can enable social science related studies by leveraging the power of LLM, which I think is not restricted to fairness issue.
A) Novelty:
While related studies are important and the prompt design of the agents for setting up games is interesting, its contribution to the field of representation learning is not clear.
B) Methodololgies:
Two games are implemented using the LLM-based multi-agent simulation system. It is not clear how representative the two games are with so far five agents involved, say in terms of the particular game design and settings. What will happen if the simulation scales up with a lot more agents? Also, the particular settings like 15 HP for each agent and 25 HP for the allocator may need some more justifications for both games. And what happened if the settings are changed.
C) Paper organization and presentation clarity:
The games are built on top of prompts designed to leverage LLM for actions in each phrase of the games, including reflection, reasoning, etc. The current presentation makes it hard to reach this point by reading only the main text of Section 3. The readers need some to scan through the Appendix before a more complete picture can be obtained. Some of the key details should be moved back to the main text.
The font size of the figures is too small, in particular Figures 2 and 3 which are supposed to illustrate how the agent evolves during the game play and should be considered important for this paper.
D) Simulation results:
Section 4.2
In Figure 2, the labels of the x-axis are retain, agent_2, agent_3, agent_4 and agent_5. More explanation is needed.
Thinking pattern shift – The shift is studied by only making reference to the keyword changes before and after the game. It is good to present more details to reveal how the reasoning evolves after each phrase and include some cases, which may help understand better the underlying strengths and limitations of taking the LLM-based approach and then followed by possible enhancements.
Subjective fairness and objective index -- One of the results is "agents perceiving unfairness in the first round will receive more in the second round". This seems not a particularly surprising result, and it is not easy to see the overall gain in term of fairness as a whole.
Section 4.3 Results of Negotiation Fairness Game
This section is a bit hard to read. It seems that some social science background is expected. Also, terms like “child_agent”, “agent_child” are mentioned without explanation (page 7). In addition, there is a concept of “circle” mentioned and used throughout the section without further explanation its definition.
Q1: How representative the two games are for illustrating the impact of the proposed simulation system
Q2: What will happen if the number of agents is largely increased?
Q3: Other than keyword change, can more details be presented to better evaluate how the reasoning evolves after each phrase? Can some case studies be added?
Q4: What are the surprising results of the simulation results and what are the implication to the field of representation learning? |
Fully human-written |