ICLR 2026 - Reviews

Submissions Reviews

Reviews

EditLens Prediction: Fully AI-generated Heavily AI-edited Moderately AI-edited Lightly AI-edited Fully human-written All

Rating: 1 2 3 4 5 6 7 8 9 10 All

Confidence: 1 2 3 4 5 All

Summary Statistics

EditLens Prediction	Count	Avg Rating	Avg Confidence	Avg Length (chars)
Fully AI-generated	0 (0%)	N/A	N/A	N/A
Heavily AI-edited	1 (25%)	4.00	4.00	1829
Moderately AI-edited	0 (0%)	N/A	N/A	N/A
Lightly AI-edited	0 (0%)	N/A	N/A	N/A
Fully human-written	3 (75%)	4.67	3.67	2041
Total	4 (100%)	4.50	3.75	1988

Title	Ratings	Review Text	EditLens Prediction
United Minds or Isolated Agents? Exploring Coordination of LLMs under Cognitive Load Theory	Soundness: 3: good Presentation: 3: good Contribution: 1: poor Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper proposes a new multi-agent prompting framework for enhancing LLM reasoning particularly in complex reasoning problems; The paper draws analogy between In-context learning failures in LLMs and cognitive load theory, claiming that the ICL's limitaiton can be viewed as a cognitive overload where the working memory of the LLM is not enough for the task's cognitive load. The paper then proposed CoThinker, which divides the cognitive loads to multiple parallel agents and maintains a shared working memory. The CoThinker framework shows moderate improvements on LiveBench and CommonGen-Hard compared with direct IO, CoT and previous multi-agent debating methods. - The analogy from cognitive load theory is interesting, and has the potential to open up new perspectives in understanding and improving large language models - A comprehensive pilot study and citations from the cognitive science perspective is provided - The empirical results show that the framework generalize well on different llms - The paper is clearly written and easy to follow - Although the paper spend a large portion trying to justify the CLT-LLM analogy, my main concern are as follows: - It is still not clear what does the "cognitive load" mean in an LLM: what's the difference between a "cognitive load" and simply "task difficulty"/"reasoning complexity"; both the attention entropy and perplexity only reflects the "uncertainty" of the model, which is also directly correlates with the difficulty/complexity of the task. - It is not clear why the analogy helps: I find it hard to justify any unique insights (compared with existing work on prompting methods) that the CLT-LLM analogy can bring. The final proposed method still boils down to: a central meta-agent + multiple sub-agents. I feel the analogy does not truly contribute to proposing a novel and effective method. - And on LiveBench, the gains against the second best baseline is very marginal; again posing a challenge on why this analogy helps - Some very related work are missing in both related work and experiments: - Wang, Zhenhailong, et al. "Unleashing the emergent cognitive synergy in large language models: A task-solving agent through multi-persona self-collaboration." arXiv preprint arXiv:2307.05300 (2023). - Suzgun, Mirac, and Adam Tauman Kalai. "Meta-prompting: Enhancing language models with task-agnostic scaffolding." arXiv preprint arXiv:2401.12954 (2024). N/A	Fully human-written
United Minds or Isolated Agents? Exploring Coordination of LLMs under Cognitive Load Theory	Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper attempts to connect cognitive load theory with limitations of large language models, and builds a multi-agent framework to mitigate cognitive overload through agent specialization and structured communication. 1. The identified issue about the mismatch between task complexity and model processing capability is important and deserves further exploration. 2. The experiments are comprehensive, with solid ablation studies and well-organized analyses. 1. Cognitive science is mainly used as a rhetorical framing to justify a fairly standard multi-agent architecture (e.g., role assignment, communication bus, small-world topology). While it works, the CoThinker system lacks real novelty in design. I would expect cognitive theories to inspire genuinely new forms of multi-agent organization, rather than merely serving as interdisciplinary justification for existing designs. 2. If CLT is to be meaningfully applied to LLMs, the key questions should be: how to measure a model’s working-memory capacity, how to quantify a task’s cognitive load, and most crucially, how to determine (in a quantifiable way) tasks be decomposed or allocated once both cognitive load and working-memory capacity are measurable. The paper does not directly address these points. 3. The validation experiments in Section 3 do not provide real evidence; they merely restate an obvious fact: harder tasks make the model less confident, and clearer instructions help with difficult problems. 1. How do you envision quantitatively measuring “working memory capacity” in LLMs, beyond indirect proxies like attention entropy or perplexity? 2. Could the proposed framework adaptively estimate cognitive load and decide when to invoke multi-agent collaboration? 3. How sensitive is the performance of CoThinker to the chosen communication topology?	Heavily AI-edited
United Minds or Isolated Agents? Exploring Coordination of LLMs under Cognitive Load Theory	Soundness: 2: fair Presentation: 3: good Contribution: 2: fair Rating: 6: marginally above the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.	This paper investigates the performance limitations of Large Language Models (LLMs) on complex, multi-faceted tasks through the lens of Cognitive Load Theory (CLT) from cognitive science. The proposed multi-agent framework CoThinker operationalizes CLT principles by distributing intrinsic cognitive load through agent specialization and managing transactional load via structured communication and a collective working memory. Experiments on LiveBench and CommonGen-Hard demonstrate improved performance over the baselines, especially on high cognitive load tasks. 1. The application of Cognitive Load Theory to explain LLM limitations is novel and insightful, bridging human intelligence and machine intelligence. 2. Clear system design of the proposed method. Each component—agent specialization, the transactive memory system, and the communication moderator—directly maps to established principles for managing cognitive load in human collaborative systems. 1. Comparisons in experiments omit some strong structured reasoning and multi-agent baselines (e.g., Tree-of-Agents, Agents with a leader, etc.). Statistical significance and variance across seeds are not consistently reported. 2. Though each part of the system can be mapped to CLT, however, these three parts are typical settings for multi-agent systems. The inspiration from CLT to design the detail algorithms of each part is lacked. 3. There is scalability analysis in the manuscript, but the experiment sets are too few to completely show the correlations between the number of agents and the performance. 1. Can you provide quantitative confirmation of small-world properties in the communication graph? 2. How sensitive are results to the choice of N and β beyond the reported ranges? Could you include task-wise adaptive selection strategies? 3. Can you provide more insights from CLT to explain the details of the system design?	Fully human-written
United Minds or Isolated Agents? Exploring Coordination of LLMs under Cognitive Load Theory	Soundness: 3: good Presentation: 3: good Contribution: 2: fair Rating: 6: marginally above the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	- This work explains the performance ceiling of in-context learning by comparing LLM attention mechanism with human working memory, where LLM has the similar cognitive load theory as in cognitive science and can be measured by attention entropy and perplexity. - Based on the CLT of LLM and solutions to cognitive overload, the work introduce a multi-agent framework, CoThinker, consisting of agent parallel thinking, transacutive memory system and communication moderator. The experiments show the effectiveness of this framework in improving LLM's performance in complex tasks. - Cognitive Load Theory provides an explanation for LLM performance limits and a clear design rationale for multi-agent collaboration. - CoThinker takes cognitive science principles—like working memory, collective cognition, and small-world communication—and turns them into practical, easy-to-understand tools. This connects human cognitive theory with machine collaboration. - The authors tested their theory using quantitative measures (entropy, perplexity) and multiple benchmarks across different model families, showing it’s robust and works broadly. Detailed ablation studies (on communication moderators, TMS, and thinking styles) help figure out which mechanisms do the most to cut down cognitive load. - CoThinker underperforms on low-load tasks (e.g., instruction following) due to communication overhead—suggesting inefficiency in simple contexts. - The chosen proxies for cognitive load (attention entropy, perplexity) are suggestive but indirect. Can you give more explanation on the relationship between attention entropy and cognitive load? Why were attention entropy and perplexity chosen as cognitive load proxies, and how might alternative measures, e.g., gradient variance?	Fully human-written

PreviousPage 1 of 1 (4 total rows)Next