ICLR 2026 - Reviews

SubmissionsReviews

Reviews

Summary Statistics

EditLens Prediction Count Avg Rating Avg Confidence Avg Length (chars)
Fully AI-generated 1 (25%) 4.00 5.00 1926
Heavily AI-edited 0 (0%) N/A N/A N/A
Moderately AI-edited 1 (25%) 6.00 4.00 3445
Lightly AI-edited 1 (25%) 2.00 3.00 2052
Fully human-written 1 (25%) 6.00 4.00 1697
Total 4 (100%) 4.50 4.00 2280
Title Ratings Review Text EditLens Prediction
IMPQ: Interaction-Aware Layerwise Mixed Precision Quantization for LLMs Soundness: 3: good Presentation: 3: good Contribution: 2: fair Rating: 6: marginally above the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. This paper introduces IMPQ, a novel framework for mixed-precision quantization of Large Language Models (LLMs) that addresses the critical challenge of deploying massive models on resource-constrained devices. The core innovation lies in modeling quantization as a cooperative game among transformer layers, where the authors propose SPQE (Shapley-based Progressive Quantization Estimation) to capture layer sensitivities and inter-layer interactions through progressive quantization rather than abrupt pruning. The application of cooperative game theory and Shapley values to mixed-precision quantization represents a significant conceptual advance. By framing layers as players in a cooperative game, the authors provide a principled approach to quantifying both individual layer contributions and inter-layer interactions, which existing methods neglect. SPQE's progressive quantization (from 4-bit to 2-bit) is a clever innovation that maintains model stability during Shapley estimation. This approach effectively avoids the catastrophic performance degradation and high variance associated with layer pruning, enabling more accurate and reliable layer importance assessment. While the paper analyzes the impact of Monte Carlo samples, it neglects a thorough examination of other hyperparameters. The diagonal shrinkage parameter $\alpha$ is fixed at 0.5 without justification or sensitivity analysis. The choice of baseline (4-bit) and target (2-bit) precisions is also not motivated or varied. Experiments rely primarily on C4 for Shapley estimation and WikiText-2 for evaluation. Testing on more diverse domains (e.g., code, multilingual text) and larger datasets would strengthen claims about generalizability, especially given the domain-specific nature of quantization effects. While several baselines are included, comparisons with recent Hessian-based methods like HAWQ are limited. The paper also doesn't compare against neural architecture search approaches for quantization, which could provide additional context for the performance gains. I'm mainly interested in the presentation and theoretical parts, with some confusing content below. Please tell me if I was wrong. The paper assumes Monte Carlo sampling provides accurate Shapley approximations without a theoretical analysis of approximation error. The value function $v_{\text{NLL}}(S) = \mathbb{E}_{(x,t)\sim D}[-\log p(x_{t+1}|x_{\leq t}; S)]$ in Equation 3 is not justified as the optimal choice for measuring layer contributions in the cooperative game framework. Section 3.2 begins with a second-order Taylor approximation $\Delta L \approx \sum_{i=1}^{L} g_i^\top \epsilon_i + \sum_{i=1}^{L} \sum_{j=1}^{L} \epsilon_i^\top H_{ij} \epsilon_j$ but then switches to a Shapley-based approach without reconciling these perspectives. The covariance matrix $C = \frac{1}{M}(\Delta v_\ell - \hat{\phi})^\top (\Delta v_\ell - \hat{\phi})$ in Equation 8 is proposed as a Hessian proxy without theoretical justification for this equivalence. The distribution $D$ in Equation 3 is not clearly specified. The perturbation $\epsilon_i$ in Section 3.2 is used without defining its relationship to quantization error. The construction of the covariance matrix $C$ in Equation 8 lacks clarity regarding dimensions and the precise meaning of $\Delta v_\ell$. It's kindly suggested to check these notations. See the weakness, mainly in the exp settings, theory, and notations. Moderately AI-edited
IMPQ: Interaction-Aware Layerwise Mixed Precision Quantization for LLMs Soundness: 3: good Presentation: 2: fair Contribution: 3: good Rating: 4: marginally below the acceptance threshold Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. This paper introduces SPQE, a Shapley-based approach for estimating layer importance via progressive quantization, and IMPQ, a MILP-based method for assigning 2- or 4-bit precision under memory constraints. The authors frame mixed-precision quantization as a cooperative game among layers, capturing inter-layer dependencies more effectively than existing heuristics. Evaluated on several LLMs and PTQ backends, IMPQ achieves significantly lower perplexity, especially under 2-bit constraints, demonstrating strong empirical performance and robustness across models and settings. The key strengths lie in the originality of modeling quantization as a cooperative game, the method’s stability under aggressive bit reductions, and the extensive and thorough experimental validation. The results clearly show that accounting for inter-layer interactions leads to better bit allocation and quantized performance than isolated sensitivity measures. 1. The method carries substantial computational overhead, with SPQE requiring many hours to estimate Shapley values even for mid-sized models. 2. The approach is currently limited to binary 2-bit/4-bit decisions, which restricts its generality, and the MILP formulation, though optimal in theory, raises questions about scalability to larger models or finer-grained bit options. 3. Moreover, the paper does not explore how robust the final assignments are to noise in Shapley estimates, nor does it fully explain implementation details such as memory constraint handling or solver configurations. 1. It would help to know whether the authors plan to support finer bit precision (e.g., 3-bit or 8-bit layers), and whether SPQE or MILP runtimes could be reduced through approximation or more scalable formulations. 2. Additionally, can this game-theoretic framework extend to other compression tasks, such as structured pruning, where modeling interactions is equally important? Fully AI-generated
IMPQ: Interaction-Aware Layerwise Mixed Precision Quantization for LLMs Soundness: 4: excellent Presentation: 3: good Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. This paper introduces IMPQ, a mixed precision PTQ method for LLMs that leverages Shapley values to estimate the importance/sensitivity of individual layers in LLMs, and forms a Hessian scaled objective for the mixed-precision problem that can be solved using quadratic integer programming solvers. The method is clear and the evaluation supports its claim of effectiveness over baselines. - The complexity of the algorithm needs to be analyzed: The calculation of Shapley value and the Monte Carlo samping seems very computationally expensive for LLMs. A comparison of the computational complexity of the IMPQ method against baselines may be needed given important factors like layer numbers, and sampling numbers - Downstream task evaluation is missing. The evaluation section only shows result on wikitext perplexity, but IMPQ method and some baselines/quantization methods needs calibration. It will be more convincible if results on downstream tasks are included. Given this perplexity, how hard it will be to apply to MoE models? Besides the weakness section, could the author answer the following questions 1. Why the average per-token NLL are used as pay-offs? 2. Could the author explain/give some intuitive hints on why IMPQ outperforms baselines? 3. Apart from the computationaly complexity analysis in Weakness section, could the author give a comparison of quantization time? This will be straightforward for readers to understand the complexity-performance trane off 4. The selection of hyper params like $\alpha$ in line 260. Could the author explain why choose this specific value of 0.5? There are a few typos in the paper due to citation formating issue like line 109 and 112 Fully human-written
IMPQ: Interaction-Aware Layerwise Mixed Precision Quantization for LLMs Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 2: reject Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This paper proposes SPQE to obtain accurate Shapley estimates of layer sensitivities and inter-layer interactions. SPQE is based on cooperative game theory. The authors also use IMPQ to find optimal bit-width assignments. 1. An innovative use of Shapley value analysis and cooperative games among LLM layers to model mixed-precision quantization. 2. Demonstrated performance improvements on Llama-3, Gemma-2, and Qwen-3. 1. Why is modeling mixed-precision quantization using Shapley value analysis and cooperative games among LLM layers more effective than traditional Hessian-based methods? The authors did not clearly explain the motivation. 2. The experimental setup seems problematic. I believe the paper does not evaluate quantization performance under an accepted standard setting. (a) The performance of the full-precision baseline is not stated. (b) The bit range defined in the paper (e.g., 2.5–3.0) is overly broad. It is not explained how these bits correspond to specific mixing ratios or group sizes, nor how fairness is maintained across different baselines. (c) The statistical results in the paper are inconsistent with previous work. A perplexity (ppl) around 15–25 is too high—much worse than those reported in existing papers. For example, in [1], methods such as OmniQuant and CherryQ achieve ppl < 10 at 2.15 bits. Although the experimental setup in [1] differs from this paper, the discrepancy should not be this large. 3. Continuing from 2.(c), I believe that while parameter sensitivity has room for optimization, improving only the sensitivity is of limited benefit. The optimal sensitivity selection strategy may only result in a small decrease in perplexity. Therefore, the claim in Table 1 that IMPQ reduces ppl by about 10 compared to other baselines may not be credible. The authors may not have obtained the optimal performance for the baselines. 4. Some writing issues—e.g., the font in Table 1 is too small. [1] Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models Please see weaknesses. Lightly AI-edited
PreviousPage 1 of 1 (4 total rows)Next