ICLR 2026 - Reviews

Submissions Reviews

Reviews

EditLens Prediction: Fully AI-generated Heavily AI-edited Moderately AI-edited Lightly AI-edited Fully human-written All

Rating: 1 2 3 4 5 6 7 8 9 10 All

Confidence: 1 2 3 4 5 All

Summary Statistics

EditLens Prediction	Count	Avg Rating	Avg Confidence	Avg Length (chars)
Fully AI-generated	0 (0%)	N/A	N/A	N/A
Heavily AI-edited	0 (0%)	N/A	N/A	N/A
Moderately AI-edited	0 (0%)	N/A	N/A	N/A
Lightly AI-edited	1 (25%)	2.00	4.00	3451
Fully human-written	3 (75%)	3.33	3.67	2488
Total	4 (100%)	3.00	3.75	2729

Title	Ratings	Review Text	EditLens Prediction
Parameter-Efficient Subspace Optimization for LLM Fine-Tuning	Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper proposes Parameter-Efficient Subspace Optimization (PESO) — a unifying framework that connects modern parameter-efficient fine-tuning methods for large language models (LLMs), such as LoRA, with the classical theory of subspace optimization. PESO provides a principled foundation that interprets these methods through an exploration–exploitation trade-off in the subspace, leading to the design of new algorithms that are both memory-efficient and have strong convergence guarantees. - Provides a framework that can cover some existing low-rank fine-tuning approaches - The paper is well-written in general and easy to follow While the paper claims contributions at the conceptual, theoretical, and empirical levels, these contributions appear insufficiently substantiated. 1. Conceptual novelty. The subspace minimization perspective is not new. This viewpoint has already been well established in GaLore [A1] and more recently revisited in Randomized Subspace Optimization (RSO) [A2]. In particular, the proposed framework in Equation (3) closely resembles RSO, where a low-rank variable $\xi$ is obtained by solving a subproblem and then added back to the base parameter $W$. The authors are encouraged to clearly articulate the distinctions between their framework and the RSO algorithm. 2. Theoretical contribution. The convergence analysis is weak and incomplete. Numerous existing works have provided both exact convergence guarantees and explicit convergence rates, such as RSO [A2], LDAdam [A3], SARA [A4], and RAC-LoRA [A5]. By contrast, the proposed algorithm only achieves convergence to a biased solution dependent on $\delta$, without demonstrating exact convergence and convergence rates. This is a major concern regarding the paper’s theoretical rigor. 3. Experimental evaluation. The empirical results are not comprehensive. The paper omits comparisons with recent strong baselines, including LDAdam, SARA, and RAC-LoRA, APPOLO [A6] which have demonstrated strong performance in both pre-training and fine-tuning settings. 4. Assumptions. In Lines 127–136, the authors argue that prior works rely on unrealistic assumptions such as $r < m$ or random projections. However, Assumptions 4 and 5 in this paper are themselves non-standard and not commonly adopted in the literature. It is therefore unconvincing to claim that the present assumptions are more natural or milder than those in existing studies. [A1] GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection [A2] A Memory Efficient Randomized Subspace Optimization Method for Training Large Language Models [A3] LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics [A4] Breaking the Frozen Subspace: Importance Sampling for Low-Rank Optimization in LLM Pretraining [A5] Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation [A6] APOLLO: SGD-like Memory, AdamW-level Performance 1. Clearly state the difference from existing subspace optimization methods such as RSO [A2] 2. Establish the exact convergence of the proposed algorithm. Establish the convergence rate of the proposed framework. Compare the rates with existing literature. 3. Conduct experiments with stronger baselines such as LDAdam, SARA, and RAC-LoRA, APPOLO	Fully human-written
Parameter-Efficient Subspace Optimization for LLM Fine-Tuning	Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper proposes PESO, a LoRA-like PEFT algorithm motivated by subspace optimization. PESO alternatively explores new subspaces via low-rank SVD like GaLore, and then exploits the subspace via Adam updates. This paper also provides theoretical convergence proofs and conduct experiments to justify PESO's efficiency. 1. PESO achieves better parameter efficiency and performance compared to vanilla LoRA, as illustrated in the experiments. 2. The algorithm is new and a theoretical convergence proof is provided. 1. In Line 109-111, the authors claim that "The resulting algorithm is, to our knowledge, the first memory-efficient method for LLM training with provable convergence to full-parameter optimality up to small errors, without additional assumptions such as explicit low-rankness of the solution." However, proir works have already established exact convergence rates for memory-efficient LLM training methods with standard or mild assumptions, including GoLore [arXiv:2410.11289] and LDAdam [arXiv:2410.16103], both of which were uploaded to arXiv one year ago. Consequently, given the non-diminishing convergence gap in Theorem 5.1 and the presence of these prior works, I highly disagree with this claim. 2. The assumptions in the convergence analysis are too strong. Specifically, the approximation error $\delta_k$ can diverge if the gradient $G_k$ diverges. The present proofs cannot exclude the case where $\lim_\{k\rightarrow\infty}\delta_k=\lim_\{k\rightarrow\infty}\\\|G_k\\\|_F=\infty$, and thus I believe Assumption 4 is a strong assumption. 3. I think the improvements of PESO, as compared to the baselines in the experiments, are limited. Other subspace optimization algorithms such as GaLore [arXiv:2403.03507], GoLore [arXiv:2410.11289] , LDAdam [arXiv:2410.16103], Fira [arXiv:2410.01623] and Subtrack++ [arXiv:2502.01586] have similar memory efficiency and much stronger performance than LoRA. It is recommended to at least include some of these strong baselines in the experiments. 1. See Weakness 1. Can the authors provide more evidence to support the claim? 2. See Weakness 2. Can the authors give more detailed explanation on why Assumption 4 holds? 3. See Weakness 3. Is PESO empirically comparable to, or better than the memory-efficient baselines I mentioned?	Fully human-written
Parameter-Efficient Subspace Optimization for LLM Fine-Tuning	Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper introduces PESO (Parameter-Efficient Subspace Optimization), a unifying framework for parameter-efficient fine-tuning of LLMs grounded in classical subspace minimization. PESO connects methods like LoRA and GaLore to a principled exploration-exploitation paradigm, for memory-efficient optimization with provable convergence in the full parameter space. The authors instantiate PESO into practical variants, PESO-LoRA-R and PESO-LoRA-T. 1. The PESO framework bridges PEFT with classical subspace minimization, offering an exploration–exploitation perspective and a unified Algorithm 1 that generalizes several existing methods. 2. PESO-LoRA-R and PESO-LoRA-T emerge as straightforward, practical special cases directly derived from the framework. 3. The paper presents theoretical guarantees for full-rank convergence under the stated assumptions. 4. The model is empirically evaluated through Llama-2-7B pre-training and multiple benchmark experiments. 1. Since the core theme of the paper revolves around exploration-exploitation, it would be natural to include targeted ablation studies, particularly examining the effects of restart frequency (K), rank (r), and related parameters. 2. Although the paper positions itself as a unifying framework, it lacks in-depth discussion and comparison with key baselines in this area; notably GaLore [1] and other state-of-the-art methods. 3. (Please correct me if I’m mistaken,) but M appears to be defined inconsistently; once as a projection map and elsewhere as a subspace. The notation would benefit from clearer, more consistent presentation. Additionally, there are minor grammatical issues (e.g., line 38: “Therefore, updating the entire …”). 4. The alignment techniques are central to the proposed algorithm and should be discussed thoroughly in the main text, rather than being deferred to the appendix. 5. The model is evaluated primarily against LoRA variants, but several recent strong baselines, including GaLore [1], APOLLO [2], LDAdam [3], FiRA [4], etc., are missing. Moreover, SubTrack++ [5], which also explores identifying optimal subspaces via geometric insights, appears conceptually related to the exploration phase and warrants discussion. 6. The evaluation results are not fully convincing, as the mentioned baselines in point 5 typically outperform LoRA variants. This raises concerns about whether the proposed algorithms offer substantial improvements or meaningful advantages. 7. The computational efficiency of the proposed methods is not addressed; in particular, time and memory costs should be analyzed, given that SVD operations are often computationally expensive. 8. The proposed variants require clearer exposition in the main text, including detailed explanations and mathematical formulations of the optimizers and steps used in Algorithms 2 and 3. The current presentation includes repetitive content, while several important details are relegated to the appendix. --- [1] Zhao et al., 2024. GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection. [2] Zhu et al., 2025. APOLLO: SGD-like Memory, AdamW-level Performance. [3] Robert et al., 2025. LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics. [4] Chen et al., 2024. Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint? [5] Rajabi et al., 2025. SubTrack++: Gradient Subspace Tracking for Scalable LLM Training Please refer to the weaknesses.	Lightly AI-edited
Parameter-Efficient Subspace Optimization for LLM Fine-Tuning	Soundness: 3: good Presentation: 3: good Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.	In this paper, the authors have introduced a unifying framework, Parameter-Efficient Subspace Optimization (PESO). This framework may cover many existing methods, such as LoRA, and bridge them with algorithms and the theory of subspace optimization. The strengths of this paper are summarized as follows: 1. It has combined multiple Parameter-Efficient Fine-Tuning (PEFT) methods, such as LoRA, AdaLoRA, and GaLore, using a single mathematical view. 2. Theoretically, it has given the first proof of a full-parameter convergence guarantee for memory memory-efficient fine-tuning method. The convergence guarantee is in the full model weight space. 3. The proposed framework, PESO, is practical. It is a plug and play design and can improve existing PEFT methods with very little modification. This seems to be very impactful in this field. The weaknesses of this paper are summarized as follows: 1. The experimental results are based on T5-base and LLaMA-2-7B. It would be better if the authors could consider including more experimental results on more models, such as LLaMA 3, and it would be more interesting to test models on different sizes. 2. The experimental results seem to focus on fine-tuning. It would be better if the authors may consider full pre-training. Also, it primarily compares against LoRA-based baselines. It lacks evaluation or comparison on Galore or Galore variants, such as GoLore [1] and Sara [2]. [1] Yutong He, Pengrui Li, Yipeng Hu, Chuyan Chen, and Kun Yuan. "Subspace optimization for large language models with convergence guarantees." ICML'25. [2] Haochen Zhang, Junze Yin, Guanchu Wang, Zirui Liu, Tianyi Zhang, Anshumali Shrivastava, Lin Yang, and Vladimir Braverman. "Breaking the Frozen Subspace: Importance Sampling for Low-Rank Optimization in LLM Pretraining". NeurIPS'25. Please see the weaknesses.	Fully human-written

PreviousPage 1 of 1 (4 total rows)Next