ICLR 2026 - Reviews

SubmissionsReviews

Reviews

Summary Statistics

EditLens Prediction Count Avg Rating Avg Confidence Avg Length (chars)
Fully AI-generated 1 (25%) 4.00 5.00 2490
Heavily AI-edited 0 (0%) N/A N/A N/A
Moderately AI-edited 1 (25%) 6.00 4.00 3035
Lightly AI-edited 1 (25%) 6.00 4.00 3451
Fully human-written 1 (25%) 2.00 3.00 658
Total 4 (100%) 4.50 4.00 2408
Title Ratings Review Text EditLens Prediction
In Agents We Trust, but Who Do Agents Trust? Latent Preferences Steer LLM Generations Soundness: 2: fair Presentation: 4: excellent Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. This paper studies latent source preferences in LLM-based agents: the authors hypothesize that models encode brand-level signals (e.g., reputation, follower counts) in their parametric knowledge and that those signals systematically bias which retrieved items the agent surfaces. They evaluate this via complementary direct (ask models which source they prefer) and indirect (show semantically identical content with different source labels) tests across 12 models and three domains (news, research-paper selection, and seller choice), as well as realistic case studies. The authors uncover multiple interesting findings about the nature and implications of LLM source preferences. 1. The core research question “whether LLM-based agents carry latent source preferences that systematically influence which items they trust and retrieve” is largely novel. This is a specific type of model bias that has not been systematically studied by prior works, but also appears timely and highly relevant to realistic LLM applications. 2. The paper is well-structured and easy to follow. Each research question is stated up front and directly answered with matched experiments and analyses, making the paper easy to follow and the claims easy to verify. 3. Experiments are comprehensive. The authors combine direct and indirect tests, synthetic and realistic case studies, broad model coverage (12 LLMs), and diverse domains (news, research papers, e-commerce), which together give the results both depth and external validity. 1. The evaluation may be vulnerable to prompt-induced shortcutting: if the same phrasing (for instance, “select the article based on journalistic standards”) is used across direct and indirect tests, models might be reacting to that cue rather than expressing a stable, content-independent source prior. Concretely, a model could learn that the phrase “journalistic standards” often co-occurs with examples from mainstream outlets during pretraining or instruction tuning and therefore surface those sources whenever the phrase appears. This would look like a latent source preference but is actually a response to prompt wording. 2. During synthetic dataset construction the authors use GPT-4o to generate/refine article variants; quantitative diversity metrics and/or human validation are needed to confirm that generated items are sufficiently distinct. 3. The evaluated models also include two smaller GPT-4.1 variants, which might undermine the validity of the findings, as it’s been discovered that models generally prefer outputs from the same model family. 1. Comparing the two case studies presented in section 5, the authors find that prompting cannot reduce source bias in the news aggregator setting, while it turns out to be effective when selecting Amazon sellers. Are there any insights for the cause and implication of such difference? 2. Since you also ask for a brief explanation from the models during evaluation, did you observe any interesting patterns in their reasoning when they select the sources? 3. Did you consider finding mechanistic explanations (within representations) for such latent source preference with open-source models to cross-validate your findings? 4. From line 285: “a model may seem to favor sources with fewer followers when asked directly, yet in practice it may assign more weight to higher follower counts”. This seems rather counterintuitive. Do you have any plausible explanations? Lightly AI-edited
In Agents We Trust, but Who Do Agents Trust? Latent Preferences Steer LLM Generations Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 2: reject Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. I dont think the paper fits with the ICLR main ML community. The paper is on findings after findings after findings, with no clear insights, no clear "so what" answers. I think the paper is more fit to Scientific Reports than a ICLR paper. The authors did a bunch of experiments. The paper has no clear take-away insights. It is more fit for a Scientific Reports kind of paper, than an ICLR paper. Would suggest the authors to consider the target conference or journals. Also would advice the authors read their paper carefully, and think about what are the main contribution, and take away from the paper. The writing is kind of missed throughout the paper. Fully human-written
In Agents We Trust, but Who Do Agents Trust? Latent Preferences Steer LLM Generations Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. This paper investigates latent source preferences in large language model (LLM) based agents systematic biases that lead models to favor certain sources (e.g., NYTimes, Nature, Amazon) over others when generating or recommending information. The authors conduct controlled and real-world experiments on 12 LLMs from six providers across domains such as news recommendation, research paper selection, and e-commerce decisions. They find that (1) source preferences are strong, predictable, and persist even when content is identical, (2) these preferences are context-sensitive, varying with domain and framing, (3) LLMs inconsistently associate different brand identities (e.g., “@nytimes” vs “nytimes.com”), creating vulnerabilities for impersonation, and (4) prompting strategies like “avoid bias” fail to eliminate these tendencies. The study reveals that LLM agents encode trust hierarchies toward real-world entities, emphasizing the need for auditing, transparency, and controllable bias mechanisms in future agent design. 1. Introduces and formalizes the idea of “latent source preferences.” 2. 12 models, 6 providers, multiple domains, and both synthetic and real world data. 3. Consistent results with rank correlation and contextual sensitivity analyses. 4. Ties directly to alignment, fairness, and trustworthiness of LLM based agents. 5. Appendices include detailed prompt templates, datasets, and code release commitment. The paper stops short of causal analysis, it does not probe which stages of training (pretraining vs instruction-tuning) most contribute to preference formation. While the phenomenon is well-characterized, the mitigation aspect is limited to showing that prompting fails. A deeper exploration of possible control mechanisms (e.g., debiasing or preference regularization) would strengthen the work. Some statistical results (e.g., rationality correlations in Fig. 5) could be better explained with accompanying confidence intervals or ablation-based sensitivity checks. The work primarily focuses on English-language and Western-domain sources; future multilingual and cross-cultural extensions would enhance generalizability. 1. Can the authors disentangle the contribution of pretraining data versus instruction-tuning datasets to these latent preferences? 2. Would fine-tuning on balanced or anonymized source data reduce these biases? 3. How would the results change for non-English or low-resource languages where brand representation is limited? Fully AI-generated
In Agents We Trust, but Who Do Agents Trust? Latent Preferences Steer LLM Generations Soundness: 3: good Presentation: 3: good Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. This paper investigates whether Large Language Models (LLMs) possess "latent source preferences," meaning they systematically favor information based on the perceived reputation or brand identity of its source (e.g., news outlets, academic journals). Through controlled experiments on twelve LLMs, the study finds these preferences are strong, predictable, context-sensitive, and can outweigh the influence of the content itself, persisting even when models are prompted to be unbiased. 1. The study validates its hypothesis with an extensive empirical evaluation across twelve distinct LLMs from six different providers , spanning synthetic and real-world tasks including news, research, and e-commerce. 2. The paper effectively isolates the phenomenon by complementing direct preference queries with a rigorous "indirect evaluation" methodology, which uses semantically identical content to disentangle latent source bias from content-driven effects. 3. The work addresses a novel and critical gap by focusing on how LLMs select and present information rather than just what they generate , demonstrating in real-world case studies that these preferences can dominate content and explain observed political skews. 1. To better situate the paper's contribution, the "Related Work" section should explicitly differentiate its findings from key studies on LLM cognitive biases, such as [1-3]. A clearer discussion is needed on how 'latent source preference' (a bias towards external entities) differs from biases originating in pretraining vs. finetuning [1], emergent cognitive biases induced by instruction tuning [2], and existing cognitive debiasing techniques focused on reasoning [3]. This would more effectively highlight the novelty of the current work. 2. A significant concern arises regarding the paper's strong conclusion from the AllSides case study—namely, that source preference "can completely override the effect of the content itself" and that the observed "left-leaning skew" is "largely attributable" to source trust. This claim appears to be undermined by the study's own control data. In the critical "Source Hidden" condition (Fig. 6), the models already exhibit a clear preference for articles originating from left-leaning and centrist sources, even when no source information is provided. This strongly suggests that the content itself (e.g., writing style, topic selection, or alignment with the models' RLHF training) is a significant confounding variable that introduces a substantial skew before source attribution is considered. Therefore, a more rigorous and defensible interpretation is that latent source preferences amplify or reinforce a pre-existing content-driven bias, rather than "overriding" it or being its primary cause [1] Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs [2] Instructed to bias: instruction-tuned language models exhibit emergent cognitive bias [3] Cognitive debiasing large language models for decision-making None Moderately AI-edited
PreviousPage 1 of 1 (4 total rows)Next