|
When Empowerment Disempowers |
Soundness: 3: good
Presentation: 4: excellent
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. |
This paper studies whether an assistant that empowers a specific person may inadvertently disempower a bystander. The authors develop a gridworld benchmark with two goal-directed humans and a single assistant which is trained to empower one of the humans. They compare the empowerment and reward gained by the assisted human and the bystander, and demonstrate that the empowerment of the bystander is often decreased by the assistant. They additionally evaluate an assistant that is trained to empower both the human and the bystander, but find that disempowerment still occurs.
- Good presentation of the idea that an empowering agent may disempower others. Interesting study that is to my knowledge novel in published work
- Broad evaluation of different methods of measuring and maximizing empowerment, even including related ideas of reachable states
- Good survey of the previous work on assistance by empowerment
- Clear presentation of arguments that flows naturally and is very readable
- To me the main question is still unanswered. I would greatly appreciate an argument as to why empowerment in particular disempowers other agents. Are these effects caused by something about the empowerment objective or something about assistance in general? I may have missed it, but I didn't see an argument as to why we should expect empowerment to be more disempowering than other types of assistance that assist a single person.
- Limited evaluation. The gridworld benchmark consists of 110 environments, but they don't vary by the walls, blocks, or initial agent positions, only by the key and goal locations. Empowerment type objectives can greatly depend on the shape of the environment. It would be helpful to see more environments that vary in shape (different walls, blocks, and agent positions), as well as more discussion on how these factors impact whether an agent disempowers another. I think these types of variations might be more important than varying key/goal positions.
- It's cliche, but going beyond the gridworld would be very helpful. I think gridworld is sufficient to see that empowerment type assistance can disempower a bystander, but the more interesting question is whether this is a product of empowerment or assistance in general. I think a larger environment may go a long way towards seeing those differences between empowerment and other types of assistance.
- No evaluation with other types of assistance. The paper's main claims are supported by the current evaluation, mostly because the paper doesn't claim that empowerment is particularly disempowering, but if the paper were to add that claim it would be very helpful to see other types of assistance compared.
- In Figure 5, I'm not convinced that disempowerment is actually occuring in a significant fashion. It looks like the Bystander's reward is nearly unaffected, and their empowerment is also fairly flat.
- Some minor notes:
- Lines 46-47, "empower one person's empowerment", should be "empower one person"
- Lines 106-107, I think the agent with the key would only have a higher effective empowerment if they actually used the key to expand the number of states they visit. For example, the agent could have a key but still just no-op and therefore have 0 effective empowerment.
- Lines 235-237, I don't think equation 4 is a lower bound on the true effective empowerment. For example, the effective empowerment of a no-op policy is 0, whereas the effective empowerment of a random policy is > 0.
- Figure 6, "no-op" is missing from the legend
- Line 407, there is an unnecessary ) after Figure 3).
- How are you measuring when an agent "disempowers"? For example, in Figure 5 the bystander's empowerment does go down, but barely, and in fact appears to be pretty close to the user's final empowerment. The same is true for Figure 3, the user's final empowerment is pretty close to the bystander's. In Figure 7 you count the number of times the bystander is disempowered---what is your threshold?
- I'm a little confused by how you are actually measuring the empowerment with rollouts. Are you independently estimating the entropies H[S_T | s] and H[S_T | s, A_T]? How many rollouts do you do?
- How are the feature vectors of the states constructed?
- I probably missed this, but I'm a little confused as to how the reward is summed up in each trajectory. From my understanding the goal state gives a reward of 1. Do the user and bystander stand over the goal state to continue to collect that reward, which is how the Bystander starts off with a reward of ~30 for the no-op assistant in Figure 3?
- The empowerment values seem pretty high to me. For example, in Figure 6 with No-Op the bystander has an empowerment of 40, which if we assume that H[S+ | s, a] = 0 (therefore maximizing I(S^+; a | s) to just be equal to H[S^+ | s]), in order for the entropy over the future states to be that large there would have to be at least 2^40 possible future states if we assumed that the distribution is uniform. Maybe I'm missing something here?
- How are you estimating the empowerment for the joint-empowerment case? Is this with the same rollout-estimator described in Eq 4? Are S_T^U and S_T^B different here, shouldn't they be the same?
- How often is the user disempowered in the joint-empowerment case? I see the results for the bystander disempowerment, but in this case they are symmetric, so user disempowerment may be worthwhile looking at (unless it is similar)
- What types of environments does the joint empowerment fail in vs succeed in? Is there a pattern here?
- Do you train the empowering assistant with the human with the same goal that they will be evaluated against?
- In line 413-414, can't the assistant cut off the hallway? Maybe I'm misunderstanding the environment here. |
Fully human-written |
|
When Empowerment Disempowers |
Soundness: 3: good
Presentation: 4: excellent
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. |
The paper considers how empowerment objectives interact with multi-agent environments. This modifies the traditional setup where an assistant interacts with a single user, adding an additional “bystander” agent that can be empowered or disempowered separately from the user. Across different gridworld environments, the paper shows that training the assistant with different empowerment objectives leads to disempowerment of the bystander agent, even when the bystander’s empowerment is explicitly included in the objective.
- Important problem: considering disempowerment of other agents in assistive setting.
- Proposes new gridworld environments to evaluate bystander disempowerment and shows naively adding a bystander empowerment term is insufficient
- All the empowerment metrics are computed under a uniform random policy. It would be nice to use more sophisticated approximations of empowerment (e.g., [[1](https://arxiv.org/pdf/2411.02623),[2](http://arxiv.org/abs/2509.22504)]). In realistic settings (language, robotics, etc.) the action space is large enough that empowerment estimates that don't optimize over the policy are not useful.
- Only small deterministic gridworlds are considered (presumably because all the approximations used are intractable with larger state spaces).
- Unclear if bystander disempowerment is fundamental to empowerment-based assistants or a result of partial-sum environment dynamics (see questions).
---
[1] Myers, V. et al., 2024. ''[Learning to Assist Humans Without Inferring Rewards](https://arxiv.org/pdf/2411.02623).'' *_Neural Information Processing Systems_*
[2] Song, J. et al., 2025. ''[Estimating the Empowerment of Language Model Agents](http://arxiv.org/abs/2509.22504).'' arXiv:2509.22504
- The paper analyzes a joint empowerment ($I\_{B} + I\_{E}$) objective. Can the bystander disempowerment problem be solved by optimizing some transformed version of the objective $f(I\_{B},I\_{E})$ such as $f(I\_{B}, I\_{E}) = \\min(I\_{B}, I\_{E})$?
- How many rollouts are used to approximate the four assistant objectives? How many states are there in the gridworld environments?
- Can we co-train all of the agents ($\\pi\_{A},\\pi\_{U},\\pi\_{B}$)?
- Is the disempowerment phenomenon a reflection of the goals of the user and bystander being in conflict? How often does this occur (i.e., what is the gap between the bystander's optimal value function and their value function under the optimal assistant for the user's goal)? Is the disempowerment phenomenon worse for an empowering assistant than it would be for a goal-inference agent like in [[1](https://proceedings.neurips.cc/paper/2020/hash/30de9ece7cf3790c8c39ccff1a044209-Abstract.html)].
- Can you discuss how these findings relate to more complex domains like language (see [[2](http://arxiv.org/abs/2509.22504)])? Do we expect findings with uniform random policies in a small deterministic setting to generalize?
- How are states represented in computing the variance in Eq. (5)?
Minor:
- Inconsistent capitalization in bibliography
- Eq. (5): $Var$ $\\implies$ $\\text{Var}$ (typeset upright)
---
[1] Du, Y. et al., 2020. ''[AvE: Assistance via Empowerment](https://proceedings.neurips.cc/paper/2020/hash/30de9ece7cf3790c8c39ccff1a044209-Abstract.html).'' *_Advances in Neural Information Processing Systems_*
[2] Song, J. et al., 2025. ''[Estimating the Empowerment of Language Model Agents](http://arxiv.org/abs/2509.22504).'' arXiv:2509.22504 |
Fully human-written |
|
When Empowerment Disempowers |
Soundness: 2: fair
Presentation: 3: good
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
This paper hypothesizes that in assistive/multi-agent settings maximizing for the assisted agent’s empowerment could *disempower* other agents/humans. The authors identify this phenomenon in some controlled test cases based on a set of empirical experiments ran in their proposed suite of multi-agent grid worlds.
- The paper is well-written. The authors clearly state their scope and contributions.
- The authors contribute new grid world environments to study the effect of empowerment in situation with more agents involved.
- The domains proposed enable easy variation to conduct the empirical evaluation
- They show empirically that the naive solution they consider, optimizing the joint empowerment, is not enough to solve the disempowerment issue.
- The paper includes a varied set of approximations to empowerment in their study.
- The evaluation domains are quite simple and I am concerned that they are not enough to model real-world domains. Though I understand that it can work for an initial suite.
- I’m not quite sure why is it a good idea for the assistant to consider that the user is random. It does seem to me that this choice might cause the disempowerment itself. If the bystander and user learned a *good* equilibrium, wouldn’t the assistant considering a purely random user cause this kind of disempowerment?
Overall, I believe this paper raises a good question about how assisting one person can affect others and it definitely seems relevant but I’m not fully sure that this is not readily solvable by some MARL solution concept. I would appreciate if the authors could extend the discussion on why this is a phenomenon that is not readily solvable by a more rigorous formalization of the problem and existing solution concepts.
- Shouldn’t the bystander policy adapt to the new situations induced by the assistant? Is the assumption that the bystander’s policy is fixed reasonable?
- Is joint empowerment optimization the only solution to consider to the disempowerment problem? What about other solution concepts that can get the problem to a more amenable solution? |
Fully human-written |
|
When Empowerment Disempowers |
Soundness: 2: fair
Presentation: 3: good
Contribution: 1: poor
Rating: 2: reject
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
This paper studies empowerment in a multi-agent setting motivated by assistive robotics, where a robot aims to help a specific person (e.g., a nurse) while also interacting with other agents in the environment. The authors benchmark several existing definitions of empowerment and other intrinsic reward formulations in grid world environments, showing that optimizing empowerment for one agent can inadvertently disempower others. The paper raises an important issue for intrinsic motivation and AI safety and is clearly written, but the experiments are limited in scope and the benchmark’s simplicity limits the overall impact.
1. The paper highlights a subtle safety failure mode of intrinsic motivation methods in multi-agent settings.
2. The presentation is clear, and the problem setup is easy to follow.
1. Section 4.1: It is unclear why assuming a uniform policy for the empowerment target is reasonable. A uniform policy might bias the empowerment calculation toward disempowering others.
2,. The notion of “disempowerment” could be made more precise.
3. It is unclear whether the four different tasks considered in the experimental section measure different effects.
4. Experimental plots appear to correspond to a single layout for a given task, which makes it difficult to assess generality across layouts.
5. The paper is primarily a benchmark study with no methodological novelty. The benchmark itself, while conceptually interesting, is relatively simple to implement and lacks realism.
6. The choice of methods benchmarked appears somewhat limited and may have been influenced by ease of implementation. However, I am not sufficiently familiar with the broader literature to determine whether this choice was primarily due to practicality or relevance.
1. Could the authors clarify what “disempowerment” means quantitatively?
2. In Section 4.1, why is joint empowerment not treated as one of goal-agnostic objectives?
3. How do the four different tasks differ in what type of empowerment interactions they elicit? That is, can there exist a method, that performs well in one of them but not in others?
4. If the empowerment target follows an optimal (rather than uniform) policy, how would this change the assistant’s incentives? This seems crucial for understanding whether the results generalize beyond random behavior. |
Lightly AI-edited |
|
When Empowerment Disempowers |
Soundness: 2: fair
Presentation: 3: good
Contribution: 2: fair
Rating: 2: reject
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
The authors propose a multi-agent gridworld environment to study assistance in a setting with a primary user, an assistant, and a bystander . The core contribution is an empirical study focused on the side effects of using empowerment estimates as an assistance objective, specifically investigating how it can lead to the "disempowerment" of the bystander. The authors test four distinct goal-agnostic objectives and find that across all of them, the assistant's actions consistently result in the bystander being disempowered.
The paper brings to light an important and practical problem: the naive extension of single-agent, goal-agnostic objectives (like empowerment) to multi-agent settings. The core issue, that empowering one user does not automatically translate to a positive or even neutral outcome for others, is a critical consideration for real-world AI deployment. The primary contributions are the "Disempower-Grid" benchmark itself, which extends existing dyadic environments with a bystander , and the thorough empirical validation using four different objectives, which helps raise the right research questions for the community.
- There's a real lack of comparison of alternate solutions. The paper only tests joint empowerment. What about a simple setup where there's a cost for disempowering the bystander as an example. Or one where the agent has access to the user policies (As an extreme case). We need to see at least initial results for other alignment approaches?
- The authors mention environmental factors, but looking closely at the environments, they look like they're designed to be biased towards zero-sum dynamics. It's hard to tell how much of the disempowerment is really from the objectives versus just being an artifact of how the environment is set up.
- There's no deep investigation of the objectives themselves. They all cause disempowerment, fine, but how? Is there a difference? The paper is missing this comparative analysis.
- After the first stage of training, where the human agents are trained, how do the empowerment results for the rollouts for both user/bystander for different environments?
- It is mentioned in the description for discrete choice that user’s future states could be affected by the bystander. But infact, the state could be affected by bystander’s action, assistant’s action as well. For all the rollouts when approximating the goal-agnostic objectives, how are the bystander’s and agent’s actions picked ? Are they uniformly random ? Or use their policies( assuming that’s part of the environment in the point of view of the user)
- Regarding alternative orientations of the Assistant and Bystander with respect to the user? For example, what if the bystander is at (3,2) instead of (0,1) [ zero - based indexing] for the Push-Pull/ Push Only and the procedurally generated experiments? Would it still exhibit similar levels of zero-sum dynamics, if at all any zero sum dynamics? I think the current orientation is such that assistant’s actions, driven by the goal-agnostic objectives, would result in zero-sum dynamics mostly. So it is harder to understand how much of that is just from the goal-agnostic objective/ and not just how the environment is set up.
- Further, all the objectives are doing something similar, in terms of increasing the information/reachability/variance/entropy of the future states available to the user for the user’s actions in the rollouts, and with discount factor added in, intuitively I think, assistant’s actions would have higher probability of freeing up the states, around the user ? Is this behavior fundamentally what’s causing the disempowerment for the bystander? If the bystander happens to be on the wrong side of the environment, we see zero-sum dynamics, and if it is around the user, it might be empowered too ?
- For the spatial bottlenecks, what are the probabilities for other actions, i.e, for the assistant to move out of the path of the user? Also, can you provide more clarity on how the agent state is treated when computing the empowerment objectives? For the push/pull settings - since the agent is embodied, does the user consider it as a blocked state?
- For Figures 3 and 4, the Push/Pull and the Push only, why is the No-op reward different in the plots for bystander? If there is no-op by the assistant, I think the reward should be the same in these two settings ?
- Could you help me understand how the results/behavior would look like for a Move Only environment, if that has been considered ? An environment where the agent cannot push or pull any of the boxes, so no reversible changes to the environment, but just navigate. In this case, would disempowerment look like the assistant blocking the spatial bottleneck for the bystander to empower the user ?
- For the Move Any Environment, to understand the objectives and assistant;s intentions better - it helps to know the probability of the different blocks that the assistant picks up for moving the blocks to disempower the bystander? Does the assistant prefer to unblock the user by moving blocks closer to the user first ? Even this environment is setup such that, perturbing 3 out of 5 blocks would disempower the bystander, i.e, there is a higher chance for the bystander getting disempowered, so hard to justify that disempowerment is from the objectives alone.
- For Move Any and Freeze, if we can discretize, what is the action space of the assistant per step? I think it is [Freeze Bystander, Move Block, Do Nothing] What is the horizon T for the rollouts when computing the objectives? What is the cause for the user reward increasing in the figures, if the assistant has learnt to freeze the bystander? (Since the block still wouldn’t allow the user to reach its goal)?
- The procedurally generated environments only modify the key and goal positions, while the majority of the disempowerment could be coming from how the blocks are positioned between the user and the bystander ? It would be worthwhile to see the results for other orientations for blocks or fully randomizing them. |
Fully human-written |
|
When Empowerment Disempowers |
Soundness: 1: poor
Presentation: 3: good
Contribution: 2: fair
Rating: 2: reject
Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. |
The paper focuses on the problem of how goal-agnostic assistance objectives behave in settings with more than one human. The authors identify what they term a critical gap in prior work: empowerment-based and choice-based assistance have been studied almost exclusively in single-human contexts, leaving multi-human effects unexplored. To investigate this, they introduce a set of gridworld environments (“Disempower-Grid”) with two PPO-trained human agents—a user and a bystander—and one assistant. The assistant is trained under several goal-agnostic objectives, including empowerment I(A;S’), an assistance-via-empowerment (AvE) proxy, and two choice-based variants. Across multiple experimental conditions and environment configurations, the authors compare these objectives and find consistent patterns where optimizing empowerment or related goals for one human alters the other’s control and reward dynamics. To test robustness across goal variations, the authors also procedurally generate 110 versions of a single environment by permuting goal and key positions, finding consistent disempowerment patterns. Finally, they evaluate a joint-empowerment objective as a mitigation, which partly reduces disempowerment but also weakens assistance to the primary user. The results collectively suggest that empowerment-style objectives require additional design considerations before being applied reliably in multi-human assistance contexts.
- The paper raises a clear and pertinent question about whether goal-agnostic assistance objectives, such as empowerment and choice-based measures, behave safely in multi-human environments.
- There are several aspects of the experimental setup that are interesting and valuable standalone:
- the instantiation of a bystander agent to support a toy setup helps to obtain interpretable results.
- I like the use case of both embodied and non-embodied assistant variants, allowing clear differentiation between direct physical interference and indirect influence through environment manipulation.
- The systematic comparison between the four objectives provides good empirical depth.
- Procedurally generated environments is a good first step towards testing the robustness of this setup.
- The writing is clear, figures are interpretable, and experimental setups are easy to follow to reproducibility.
- While the paper addresses an important and open challenge in the domain of AI assistants, I am hugely concerned about the limitations of the setup considered. Specifically, the authors are hypothesizing and making predictions about the emergence of disempowerment as a consequence of single agent empowerment ensued by an AI assistant. However, I feel that the setup considered in this paper falls much short of providing a convincing test for the hypothesis due to following reasons:
* I am not convinced by the user and bystander setup in how it is instantiated. The simulated “humans” are implemented as frozen PPO-trained agents rather than any form of human-proxy or adaptive behavioral model. This makes the setup unrealistic for studying assistance, since humans would not act as stationary, fully rational policies. Without adaptation or preference uncertainty, the results say more about interactions between fixed RL policies than about multi-human assistance.
* The training pipeline is internally inconsistent. During empowerment estimation, the assistant assumes that humans act according to a uniform random policy, even though the humans in evaluation follow PPO-learned strategies. Because empowerment I(A;S’|s) depends on the distribution p(a|s), this mismatch violates the underlying definition and breaks the link between the optimization target and the true interaction dynamics.
* One of my major concerns also stems from the presentation of setup as general sum. While the over setup does seem to be non-competitive from user’s perspective, the situations are effectively competitive from assistant’s perspective. In many tasks, it is very difficult for the assistant to help the user without blocking or restricting the bystander’s motion (for example, in the Push/Pull and Freeze conditions). The observed “disempowerment” therefore arises trivially from the environment’s geometry rather than from a deeper property of empowerment objectives.
- I am also concerned with the empowerment formulation which appears coarse in this context.
- Empowerment is computed with rollout sampling under a uniform policy, making it a measure of reachability entropy. This seems to weaken the theoretical grounding of the objective and undermines claims about empowerment itself.
* Local empowerment tends to reward occupying bottlenecks or high-control regions. In shared environments, this naturally suppresses others’ reachable states. The paper interprets this as an emergent safety failure, but it is a direct and predictable consequence of the local control bias in empowerment maximization.
* I may be misunderstanding this part but the paper seems to claim that empowerment-based objectives are “goal-agnostic” and therefore safer which would be misleading. Empowerment intrinsically encodes a value preference for influence and optionality. The assistant’s tendency to dominate shared control spaces is an expected outcome of optimizing for control and not an unforeseen failure mode.
* As per my understanding, the comparison across empowerment, AvE, and choice-based objectives lacks normalization or calibration. Each is estimated using different scaling or entropy measures, so their quantitative differences are not directly comparable. As a result, the figures illustrate relative trends rather than meaningful metric comparisons.
* The paper misses a very important discussion and comparison with multi-principal assistance games [1] and related multi-human alignment frameworks. These models explicitly address how an assistant should balance incentives among multiple principals and analyze trade-offs between efficiency and fairness. Situating the disempowerment findings relative to MPAG theory would clarify whether the observed behaviors violate established normative assumptions or merely reflect missing social-welfare constraints.
* The paper also does not include comparison with strong baseline explicitly designed to prevent harmful side effects, such as Stepwise Relative Reachability [2]. Including this would test whether the observed disempowerment persists under known mitigation frameworks and would contextualize the result within existing alignment research.
Finally, I find some critical issues with the the empirical analysis presented in the paper
* The empirical analysis lacks depth in both ablation and evaluation. There are no statistical tests, only qualitative mentions of significant effects, which makes the magnitude and consistency of the disempowerment effect uncertain. It might also be helpful to examine the robustness across horizons or rollout samples.
* The joint-empowerment formulation is overly simplistic. Summing empowerment across agents introduces a crude trade-off that predictably reduces assistance effectiveness without addressing the core issue of conflicting control incentives. More principled multi-agent formulations, such as relational or max-min empowerment, are not explored.
* The motivating examples, such as the hospital assistant scenario, overextend the implications of the toy gridworld findings. Real assistive systems are trained under explicit multi-principal or social-welfare objectives; hence, the leap from small-scale spatial disempowerment to real-world alignment concerns is highly speculative.
[1] Multi-Principal Assistance Games, Arnaud Fickinger, Simon Zhuang, Dylan Hadfield-Menell, Stuart Russell, 2020
[2] Penalizing side effects using stepwise relative reachability, Victoria Krakovna, Laurent Orseau, Ramana Kumar, Miljan Martic, Shane Legg, 2019.
- What empirical evidence shows that disempowerment stems from the empowerment objective rather than from environment geometry or bottlenecks?
- The training setup of both the user and assistant is highly artificial. Why is empowerment computed under a uniform policy instead of the PPO agents’ policy distribution and PPO agents trained with random assistant justified? similarly why non adaptive user and bystander make sense when trying to evaluate human AI interactions?
- How sensitive are the results to the empowerment horizon T rollout count, or the stochasticity of empowerment estimation?
- Were the empowerment, AvE, and choice-based objectives normalized to allow direct quantitative comparison?
- Could a lexicographic or max–min empowerment objective better preserve fairness than the summed joint-empowerment formulation?
- Why were impact-regularization baselines such as Stepwise Relative Reachability not included?
- How does the paper’s formulation relate to Multi-Principal Assistance Games, and would those frameworks predict or mitigate the same disempowerment effects?
- Are the results statistically significant across seeds or environment instances, and what tests were performed to confirm consistency?
- Would more cooperative or less constrained environments—where helping one human does not restrict another—produce the same trends? |
Fully human-written |