ICLR 2026 - Reviews

SubmissionsReviews

Reviews

Summary Statistics

EditLens Prediction Count Avg Rating Avg Confidence Avg Length (chars)
Fully AI-generated 0 (0%) N/A N/A N/A
Heavily AI-edited 0 (0%) N/A N/A N/A
Moderately AI-edited 0 (0%) N/A N/A N/A
Lightly AI-edited 1 (25%) 6.00 4.00 2675
Fully human-written 3 (75%) 4.00 3.33 2898
Total 4 (100%) 4.50 3.50 2842
Title Ratings Review Text EditLens Prediction
Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models Soundness: 3: good Presentation: 3: good Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Given the phenomenon of LRMs that reasoning traces often contradict with final answers, this paper investigates what factor influences the reasoning ability, in terms of the conflict between retrieval of LRMs' knowledge (internal knowledge) and Chain-of-Thought (external prompting). 1. The authors find that both COT and internal knowledge contribute to the final answers. 2. The authors investigate how some competing factors(model sizes, reasoning problem domains, reasoning model training methods) influence the reasoning abilities of LLMs. The authors also validate the "post-hoc exaplanation phenomenon" (models try to fabricate reasoning steps to derive a false answer), further prove that reasoning ability that distill from LRMs is not reliable as RL. 3. To validate the findings above, the authors suggest apply RL on reasoning-intensive datasets to avoid the model from retrievaling results from their own knowledge. The authors propose to train model with knowledege unlearning method to forget specific knowledge(with GRPO and NPO specifically). 1. The authors conduct sufficient empirical experiments to validate how the factors of COT prompts and internal model knowledge influence the final results. 2. The authors conduct "Post-Hoc Explanation" experiments to cross validate the results of previous methods. 3. The authors leverage an unlearning methods, which add NPO after the GRPO to demonstrate that weaken the knowledge ability of LRMs would enhance the reasoning ability of LLMs, in terms of reasoning robustness and effectiveness. The authors performance extra evaluation on reasoning path quality. 1. The knowledge and reasoning ability may still hard to decouple to analysis. The SFT attack may still harm the reasoning ability of COT,As experiments in table1 demonstrates, SFT give worse R-PSR than original R1-Llama-8B. The authors just claim the assumetion in Line 150 that the impact would be small. 2. Some claims seem not proper: * In line 48, this research seems still not "a mechanistic understanding of how different capabilities jointly influence LRMs’ answer generation" * From line 360 to line 362,It is not easy to understand that "our findings reveal a challenge where the retrieval mechanism enables models to “hack” the reward signal during RL and impair its effectiveness". As you will find if you estimate $\delta=T-PSR - PER$, which accounts for LRMs own reasoning failure, it is even more than PER. 3. Current study mainly focus on Multi-choice QA, but more open-ended problems could be studied in the future. 1. The section `Attention Patterns` seems not related to other sections, especially the results is not used for RQ3. What's the purpose of this section. 2. Some results settings: * How did the authors get the result of Fig2 on different datasets? * What's the experiment settings of RQ3 Table1 to get the performance? As previous "pertubation attack" only conduct on the questions that LLM can correctly solve. 3. The authors aim to interval the "retrieval knowledge shortcut" to enhance the reasoning ability of models. However, the actual reasoning accuracy may decrease. As the model still need the corrpsond basic knowledge to perform get the results. What if the authors test the result of unlearning some intermediate knowledge used by COT, for those $y_r$ or $y$ results? 4. As the reasoning ability could be enhanced when LLMs are unlearned the final knowledge, if the LLMs can access external knowledge, can they achieve better accuracy on harder problem? Fully human-written
Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models Soundness: 2: fair Presentation: 3: good Contribution: 2: fair Rating: 2: reject Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. Large Reasoning Models (LRMs) generate answers using two competing mechanisms a) Chain-of-Thought (CoT) Reasoning and b) Memory Retrieval. These mechanisms can conflict, leading to inconsistencies between reasoning traces and final answers. To show that LRMs LLM use both reasoning and retrieval simultaneously, authors designed controlled perturbation experiments to perturb either reasoning or retrieval. The reasoning steps are subtly altered to be misleading or incorrect. Or, the model’s memory is poisoned with misleading cues. They find that smaller or distilled models are more vulnerable to retrieval perturbations. They might fabricate reasoning traces to justify retrieved answers. In comparison, larger models and those trained with reinforcement learning are more robust and reasoning-driven. Given on this experiment, authors proposed FARL (Forgetting-Augmented Reinforcement Learning) to: a) suppress retrieval shortcuts, b) enhance reasoning-dominant behavior, c) improve generalization and robustness. It achieves 47.8% improvement in CoT robustness, 22.8% accuracy gain in-domain tasks, and 5.8% accuracy gain out-of-domain tasks. The methodology is rigorous, including controlled perturbation experiments and attention head analysis, which strengthen the validity of the findings. Understanding the interplay between reasoning and retrieval is critical for advancing trustworthy AI, especially in high-stakes domains like math, logic, and scientific reasoning. The paper primarily focuses on math and logic tasks to evaluate reasoning vs. retrieval. They are limited on multiple-choice QA, may not generalize to open source qa. Please add an evaluation on free-form answers with verifiable graders, such as [GeneralThought](https://huggingface.co/datasets/RJT1990/GeneralThoughtArchive). R-PSR and T-PSR are correlational indicators, not causal evidence of pathway dominance. A misleading cue that flips an answer does not prove the answer was reasoning-driven. Can we add causal intervention experiments (e.g., targeted weight/activation ablations on putative retrieval heads; causal scrubbing on residual streams) to show counterfactual dependence of the final answer on each pathway? Metrics like R-PSR and T-PSR are binary and may not capture nuanced interactions between reasoning and retrieval. For example, a model might partially rely on both mechanisms in a non-exclusive way. What if we can add token-level attributions such as logit lens analyses over steps? The extraction of answers and judgment of reasoning correctness occasionally falls back to GPT-4o-mini for answer extraction (section 3.2). Does misclassification inflate perturbation success? I suggest that authors add a human-validated subset or at least a majority-vote ensemble judge. How does NPO work in FARL? Did you input all x and y to the NPO? How does it compell models to “forget” specific memorized answers? Unify b_x (line 4) and x (line 9) in Algorithm 1 for easier to understand Fully human-written
Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models Soundness: 3: good Presentation: 3: good Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. This paper focuses on the two primary competing capabilities that influence the final answers of LRMs: deliberate reasoning through CoT and direct retrieval from internal memory. And the paper identifies the key factors that determine the dominance of reasoning versus retrieval through controlled intervention experiments, and ultimately proposes a post-training method, FARL, to regulate the relative strength of these two capabilities. 1. The competitive interplay between reasoning mechanisms and memory retrieval in LRMs is an important and timely research topic. 2. The paper is well-structured and clearly organized. 3. I like the idea and design of FARL — it effectively validates the paper’s conclusions and serves as a good contribution to the study. 1. In Section 3.1, I am considering whether SFT can serve as a reasonable method for directly modifying a model’s memory. While SFT does increase the probability of target tokens during deep-layer processing, evidence from mechanistic interpretability research [1, 2] suggests that it does not directly alter the MLP-stored factual knowledge or modify the model’s internal retrieval mechanisms. Therefore, the reliability of SFT as a means of memory intervention has a direct impact on the validity of the experimental conclusions in this work. 2. The investigation of the reasoning mechanism in LRMs is limited to injecting misleading cues into the CoT and observing the model’s response across domains. However, the paper does not examine the **intrinsic** reasoning behavior of LRMs within those domains. For instance, in mathematical tasks, the interplay between the model’s **inherent** mathematical reasoning and its memory recall mechanism is not explored. 3. Similarly, the study of retrieval mechanisms introduces new “memory” through SFT, rather than examining the model’s inherent knowledge retrieval capabilities. 4. The experiments are restricted to multiple-choice tasks. I would like to see results on open-ended generation settings as well, as such tasks would more naturally reflect the model’s reasoning and retrieval interplay in real-world scenarios. 5. I appreciate the problem studied in this paper. Although I guess some inspiration may come from *Competition of mechanisms: Tracing how language models handle facts and counterfactuals*, I see this as acceptable and good for me. Still, consistent with Point 4, I would like to see results across more diverse task formats to strengthen the conclusions. Overall, I appreciate the idea presented in this work and would be happy to reconsider my rating if the above concerns are addressed sufficiently. Please see the weaknesses above. Lightly AI-edited
Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models Soundness: 2: fair Presentation: 3: good Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This paper make the mechanism study towards the chain-of-thought process of Large Reasoning Models (LRM). They conduct controlled experiments to analyse whether LRM generate answer by CoT reasoning or memory retrieval. Based on that, the paper answer 3 research questions: RQ1: Do LRMs employ reasoning and retrieval simultaneously to derive answers? RQ2: What factors influence the dominance of one capability over the other? RQ3: How can we control the relative strength of these capabilities? Build upon those findings, the author propose a novel fine-tuning framework that integrates memory unlearning with reinforcement learning that enhances generalizable reasoning capabilities. The experiment is conducted among different types of model size, architecture, and training paradigm, which make to conclusion relative plausible. The paper employs machine unlearning method to optimize training process, which enhances generalizable reasoning capabilities. Though the experiment is conducted thoroughly, the memory and reasoning perturbation mechanism is not plausible enough and needs further discussion, see question part for detals. 1. reasoning perturbation is achieved by placing error answer in the end of thinking phase. However, the design seems to be simple or naive, since transformer mechanism normally focus on recent tokens (and attention sink), which could make LRM ignore the previous thinking process and make shortcuts. 2. memory perturbation is fine-tuning qa-pairs to change memories of LRM, though the author claims that fine-tuned phase is restricted to relevant knowledge and minimize side-effect, I am not sure how this can be guaranteed. For example, deteriorate the reasoning capability of LRM, which might in-turn weakening the experiment conclusion. More ablations would by beneficial. 3. the paper disentangle reasoning and memory by whether letting LRM to think or not, though it is simple and effective, however, during reasoning phase, some knowledge could be retrieved from LRM's memories. 4. FARL experiment lacks of discussion of the training dataset. If those concerns are settled properly, I am willing to raise my score. Fully human-written
PreviousPage 1 of 1 (4 total rows)Next