ICLR 2026 - Reviews

Submissions Reviews

Reviews

EditLens Prediction: Fully AI-generated Heavily AI-edited Moderately AI-edited Lightly AI-edited Fully human-written All

Rating: 1 2 3 4 5 6 7 8 9 10 All

Confidence: 1 2 3 4 5 All

Summary Statistics

EditLens Prediction	Count	Avg Rating	Avg Confidence	Avg Length (chars)
Fully AI-generated	1 (33%)	6.00	3.00	1907
Heavily AI-edited	0 (0%)	N/A	N/A	N/A
Moderately AI-edited	0 (0%)	N/A	N/A	N/A
Lightly AI-edited	1 (33%)	4.00	4.00	2544
Fully human-written	1 (33%)	2.00	2.00	1061
Total	3 (100%)	4.00	3.00	1837

Title	Ratings	Review Text	EditLens Prediction
From Offline to Online Memory-Free and Task-Free Continual Learning via Fine-Grained Hypergradients	Soundness: 3: good Presentation: 3: good Contribution: 1: poor Rating: 2: reject Confidence: 2: You are willing to defend your assessment, but it is quite likely that you did not understand the central parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.	This paper leverages hyper-gradients for continual learning. The key idea is to use prototypes as memories and use hyper-gradients for adaptive learning rate selection. The paper writing is clear. Experiments show the effectiveness of the proposed method under the proposed setting. Limited novelty: both the hyper-gradient and prototype based memory are not new, they have been widely used in previous works for adapting learning rates (https://arxiv.org/pdf/1703.04782) and prevent forgetting (https://arxiv.org/pdf/2308.00301) already. Experiment setting and claims: This work claims to be online and memory free, however, it uses cached prototypes which is also just a form of memory, without having other methods using the same compute, memory and storage, it is not fair to claim the performance gain. Also, even with complex method implementation, the method is just a little bit better than simple ER, while uses heavy hyper-parameter tuning, which is prohibitive in the online CL scenario. This makes the setup and method both far from practical. NA	Fully human-written
From Offline to Online Memory-Free and Task-Free Continual Learning via Fine-Grained Hypergradients	Soundness: 3: good Presentation: 3: good Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.	This paper addresses the migration of offline memory-free continual learning (CL) methods to online, memory-free, and task-free CL scenarios. It introduces a prototype-based auxiliary memory module (P) and a fine-grained hypergradient mechanism (FGH) that dynamically balances gradient imbalance and learningrate sensitivity. Experiments on CIFAR100, CUB, and ImageNet-R show consistent gains across multiple baselines under multi-learning-rate evaluation. The work is practically motivated and conceptually coherent, offering a bridge between offline and online CL paradigms. 1) The topic is timely and relevant, targeting the underexplored Offline→Online transition in CL with clear theoretical and practical significance. 2) The proposed P+FGH framework effectively addresses two core challenges of online CL — catastrophic forgetting and gradient imbalance — through a minimal-intrusive and generalizable design. 3) Experiments are comprehensive, covering diverse datasets and learning rate settings, demonstrating the method’s robustness and transferability. 1) The online scenario remains quasi-online, relying on pre-defined task splits rather than fully stream-based settings, limiting realism. 2) The novelty of both P and FGH is moderate: the prototype update mirrors CoPE (2021), and FGH lacks formal convergence or stability analysis and clear differentiation from prior hypergradient methods. 3) Recent baselines (e.g., PROL 2025, PMLR 2025) are missing, and parameter details (γ, β₁/β₂, Si- Blurry settings) are insufficiently reported, affecting reproducibility and fairness. 1) How would the proposed FGH behave under fully stream-based or class reappearance settings? 2) Has γ been systematically tuned or theoretically analyzed for robustness across datasets? 3) Can the authors quantify FGH’s computational overhead compared to existing hypergradient or adaptive LR optimizers?	Fully AI-generated
From Offline to Online Memory-Free and Task-Free Continual Learning via Fine-Grained Hypergradients	Soundness: 3: good Presentation: 3: good Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	The core objective of this paper is to address a challenging problem in the field of Continual Learning (CL): how to successfully adapt effective memory-free algorithms from idealized offline (offCL) settings to more realistic and difficult online, task-free (onCL) environments. To solve the problem of gradient imbalance, the paper proposes its core innovation, Fine-Grained Hypergradients (FGH). This is a novel optimization technique based on the key idea of: + Learning an independent, dynamic gradient weight for each parameter within the model. + Leveraging the gradient directions from two consecutive iterations to assess learning stability: if the directions are aligned, the update step is amplified; conversely, if they are opposed (indicating oscillation), the update is suppressed. 1. The problem addressed by the paper online, memory-free, task-free continual learning is indeed a highly challenging and practically significant direction in the current field. 2. The combined framework proposed in this paper achieves outstanding performance in experiments, especially under the 'multi-learning-rate evaluation' paradigm designed by the authors, showcasing the robustness of their method. 1. The entire work can be viewed as an effective combination of two known techniques (prototypes and hypergradient descent), making the contribution more empirical than conceptual. The performance improvement from FGH largely stems from enhancing plasticity in the online setting; Equation (7) progressively increases the intra-task learning rate to boost plasticity, a mechanism that has been explored in prior work [1]. 2. Regarding catastrophic forgetting, the method essentially relies on prototype replay, which is also a common technique in previous literature. For a venue like ICLR, which seeks fundamental innovations, the weight of this contribution is insufficient. 3. The authors use ADAM in their experiments. From a learning rate perspective, could ADAM and FGH conflict? Is it possible for a situation to arise where ADAM suggests a large learning rate while FGH suggests a small one? In other words, do FGH and ADAM work synergistically, or is there a functional redundancy? Given the prevalence of ADAM, the authors should have included a discussion on this. 4. The authors should provide a comparative experiment between a "global FGH" and the proposed "fine-grained FGH" to demonstrate the necessity of the fine-grained design. [1] Online Learning Rate Adaptation with Hypergradient Descent ，ICLR2018 See the weakness	Lightly AI-edited

PreviousPage 1 of 1 (3 total rows)Next