ICLR 2026 - Reviews

Submissions Reviews

Reviews

EditLens Prediction: Fully AI-generated Heavily AI-edited Moderately AI-edited Lightly AI-edited Fully human-written All

Rating: 1 2 3 4 5 6 7 8 9 10 All

Confidence: 1 2 3 4 5 All

Summary Statistics

EditLens Prediction	Count	Avg Rating	Avg Confidence	Avg Length (chars)
Fully AI-generated	2 (50%)	6.00	3.00	1542
Heavily AI-edited	0 (0%)	N/A	N/A	N/A
Moderately AI-edited	1 (25%)	4.00	4.00	3764
Lightly AI-edited	0 (0%)	N/A	N/A	N/A
Fully human-written	1 (25%)	4.00	4.00	2105
Total	4 (100%)	5.00	3.50	2238

Title	Ratings	Review Text	EditLens Prediction
Accurate and Efficient Singular Value Decomposition For LLMs via Decay-aware Rank Allocation and Feature-Preserved Weight Update	Soundness: 3: good Presentation: 3: good Contribution: 3: good Rating: 4: marginally below the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper introduces the DF-SVD framework, which aims to simultaneously improve accuracy recovery and compression efficiency in large language models. The authors first identify two key issues with conventional SVD-based compression methods: difficulty in selecting appropriate truncation and update ranks, and limited fine-tuning stability. To address these challenges, the paper proposes two core modules. The first, Decay-Aware Rank Allocation, models the singular value decay rate of each layer’s weight matrix to dynamically determine both truncation and update ranks, achieving adaptive compression across layers and matrices. The second, Feature-Preserved Weight Update, freezes the dominant components and only updates the minor subspace, thereby preserving critical pretrained features while improving the isotropy of the Hessian for faster convergence. Experimental results show that DF-SVD consistently outperforms existing methods such as SVD-LLM, ASVD, and Dobi-SVD on LLaMA, LLaMA2, LLaMA3, and OPT models under 30–60% compression, achieving comparable accuracy with 7–16× faster end-to-end compression. 1. Clear and practical implementation design. The paper follows the SVD-LLM pipeline with whitening (via Cholesky decomposition) and SVD pre-processing, while confining its innovation to rank allocation and the update subspace. This design choice makes the method easy to reproduce, integrate, and deploy in real-world model compression workflows. 2. Comprehensive experimental evaluation. The experiments cover multiple model families (LLaMA and OPT) and diverse datasets, and report both accuracy and end-to-end compression time. The study also compares DF-SVD against pruning and quantization methods, demonstrating its compatibility and potential for combined use. 1. Limited novelty. The paper’s motivation—improving rank selection and reducing update time—targets a well-studied problem. While the proposed approach is practical, it appears relatively straightforward and lacks deeper theoretical innovation. For example, using singular value decay as a heuristic for rank allocation is intuitive but overlooks inter-layer importance differences; in practice, some critical layers may still require higher ranks even with rapid singular value decay. 2. Insufficient validation of the exponential decay assumption. The core rank allocation mechanism hinges on the assumption that singular values follow an approximately exponential decay pattern and can be modeled by a single parameter λ. Although the paper provides preliminary theoretical reasoning and empirical evidence, it lacks sensitivity analyses showing how deviations from this assumption affect model performance, as well as more rigorous theoretical justification. 3. Under-examined assumptions in the optimization analysis. The theoretical claim that the Hessian becomes isotropic (𝐻=2𝐼) depends on the assumption of nearly orthogonal, whitened inputs. However, it remains unclear whether this assumption holds under small-sample calibration or distributional shift, and whether it is consistent across different layers or batches. While freezing principal components may preserve pretrained knowledge, it could hinder adaptation in cases of aggressive compression or significant domain shift. 4. Modest empirical gains. In the reported results, DF-SVD achieves only marginal improvements over SVD-LLM in accuracy, which may not be sufficient to demonstrate a strong advantage given that DF-SVD employs mixed-rank allocation whereas SVD-LLM uses a fixed rank. Although DF-SVD shows faster compression compared with Ada-SVD and Dobi-SVD, the performance comparisons are not exhaustive, leaving some uncertainty about its overall effectiveness. Refer to the weakness section	Moderately AI-edited
Accurate and Efficient Singular Value Decomposition For LLMs via Decay-aware Rank Allocation and Feature-Preserved Weight Update	Soundness: 2: fair Presentation: 3: good Contribution: 2: fair Rating: 6: marginally above the acceptance threshold Confidence: 2: You are willing to defend your assessment, but it is quite likely that you did not understand the central parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.	This paper proposes DF-SVD, a SVD-based compression framework for large language models (LLMs). It addresses two key challenges in existing SVD compression: 1. Rank Selection Problem – current methods rely on costly search or uniform rank allocation. DF-SVD introduces decay-aware rank allocation, which leverages the singular value spectrum’s decay rate to assign truncation and update ranks per weight matrix dynamically. 2. Limited Accuracy Restoration – sequential weight updates in prior work (e.g., SVD-LLM) lead to Hessian anisotropy and slow convergence. DF-SVD proposes a feature-preserved weight update strategy that freezes principal components and only updates minor components, ensuring Hessian isotropy and preserving pretrained knowledge. 1. Clear motivation and problem definition: Identifies two fundamental bottlenecks in SVD compression (rank allocation and update inefficiency). 2. Theoretical contribution: Provides analysis showing Hessian isotropy under the proposed update scheme, linking spectral properties to convergence guarantees. 1. Generalization to larger models: Experiments are on 7B–8B scale models; it remains uncertain how well DF-SVD scales to 30B+ models. Have you tested DF-SVD on huge models (e.g., Qwen3-30B-A3B-Instruct-2507)? Does the efficiency advantage hold at that scale?	Fully AI-generated
Accurate and Efficient Singular Value Decomposition For LLMs via Decay-aware Rank Allocation and Feature-Preserved Weight Update	Soundness: 3: good Presentation: 3: good Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper proposes DF-SVD, a method for compressing LLM using SVD with two main contributions: (1) decay-aware rank allocation that dynamically determines truncation and update ranks based on singular value decay, and (2) feature-preserved weight updates that achieve isotropic Hessian by fixing V^TS^{-1} and selectively updating only minor components of UΣ. The method achieves much speedup over existing SVD-based methods while maintaining or improving accuracy. 1. Strong exp results and practical speedup. 2. Sound theoretical analysis: The Hessian conditioning analysis (section 3.2) is mathematically sound and provides clear intuition for why the proposed reformulation can achieve better convergence properties 1. Limited novelty relative to SVD-LLM: The paper heavily builds on SVD-LLM's foundation (Cholesky whitening, sequential optimization framework, experimental setup). Much of the methodology is inherited, making this more of an incremental improvement. 2. Missing critical comparisons: A. No comparison with AdaLoRA: The paper cites AdaLoRA for importance-based rank allocation but never compares against it B. No comparison with other methods: Methods like "Dynamic Low-rank Estimation for Transformer-based Language Models" (Hua et al., EMNLP 2023 findings) are highly relevant but not discussed or compared 3. No empirical validation that decay coefficient actually correlates with ground-truth importance (e.g., gradient magnitudes, ablation impact) 1. Can you show via experiments that Hessian isotropy causes the speedup (e.g., via iteration counts, convergence curves)? 2. Why not compare with AdaLoRA, which you cite as inspiration? 3. What is the correlation between λ_norm and ground-truth importance metrics (gradients, sensitivity)?	Fully AI-generated
Accurate and Efficient Singular Value Decomposition For LLMs via Decay-aware Rank Allocation and Feature-Preserved Weight Update	Soundness: 3: good Presentation: 2: fair Contribution: 3: good Rating: 4: marginally below the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper proposes DF-SVD to compress Large Language Models using Singular Value Decomposition. It solves two key challenges: the Rank Selection Problem and Limited Accuracy Restoration. There are two innovations: 1. Decay-Aware Rank Allocation, which dynamically assigns truncation and update ranks to each weight based on its singular value decay characteristics, eliminating the need for costly search; 2. Feature-Preserved Weight Update, a theoretically-grounded strategy that freezes key matrix components while only updating minor ones. This update strategy ensures an isotropic Hessian, leading to superior accuracy and faster convergence. The results show that DF-SVD outperforms existing methods. 1. The paper validates DF-SVD across four different models (LLaMA 1/2/3 and OPT) and eight datasets, consistently demonstrating superior performance. 2. The authors provide a detailed ablation study that confirms the positive impact of both the Decay-Aware Rank Allocation and the Feature-Preserved Weight Update components. 3. The method is efficient, completing the entire compression process 7-16 times faster than competing SVD baselines. 1.The Decay-Aware Rank Allocation method relies on an original truncation position ($ra_{old}$) and update rank ($rank_{old}$). It’s not clear how these critical baseline values are chosen, which makes the results difficult to reproduce. 2. Lack of theoretical proof for the assumptions (such as the reason of the singular value spectrums follow an exponential decay model should be justified). 3.I was wondering whether the reported wall-clock time includes the LoRA fine-tuning stage, or only the SVD and calibration steps. 4.The update procedure using LoRA in section 3.2 looks quite similar to SVD-LLM . Could your please articulate the key differences/novelty 5.The analysis of the Hessian (convergence) is based on minimizing reconstruction error, not the model's final task loss. This optimality for the reconstruction objective may not hold for the task objective. Is this a negative impact to the task performance? Please see the weaknesses	Fully human-written

PreviousPage 1 of 1 (4 total rows)Next