ICLR 2026 - Reviews

SubmissionsReviews

Reviews

Summary Statistics

EditLens Prediction Count Avg Rating Avg Confidence Avg Length (chars)
Fully AI-generated 0 (0%) N/A N/A N/A
Heavily AI-edited 0 (0%) N/A N/A N/A
Moderately AI-edited 0 (0%) N/A N/A N/A
Lightly AI-edited 0 (0%) N/A N/A N/A
Fully human-written 3 (100%) 2.67 4.00 2065
Total 3 (100%) 2.67 4.00 2065
Title Ratings Review Text EditLens Prediction
CMPS: Constrained Mixed Precision Search Soundness: 2: fair Presentation: 3: good Contribution: 2: fair Rating: 2: reject Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. The paper introduces a differenciable NAS algorithm for data format allocation in a network. It first formalizes the optimization problem, including architecture constrains (the maximum average number of bits for a model), then proposes a gradient descent based heuristic for solving the mixed precision data format allocation problem. While the method is fully post-training, it still requires a small calibration data set to perform the training of the data format precision parameters. The paper shows that using this formalism, mixed precision constrained NAS can achieve better results than uniform quantization. - The mathematical formulation of the constrained optimization problem for mixed precision data format allocation seems fairly general; - The large number of results (which are combinations between models and tasks used for the calibration) seems to demonstrate the robustness of the approach. - The paper completely lack any comparison with the state-of-the-art! No comparison with other mixed-precision post-training optimization methods is even attempted... yet plenty exists. That is clearly a major issue in this paper. - While the fundations of differentiable NAS methods seems to be adequately described and cited, the novelty of the proposed method remains hard to grasp. I would suggest to add a short but clear statement on what it brings compared to the closest SoTA work. - While the method seems very general, it is frustrating see it tested on a single NAS scenario, namely, 4.5 bit mixed precision with the MXFP data format. What about mixing different formats (integer, FP...)? Or testing other maximum average number of bits (like 3.5, or 5.5...)? - The perplexity/accuracy gains of the method remain modest and the proposed NAS scenario is too limited. Please carefully answer the issues mentioned in the weaknesses section. I may increase my rating provided that at least 1) quantitative comparison with other SoTA methods is provided. 2) Additional NAS scenario, beyond 4.5 bit mixed precision is evaluated. Fully human-written
CMPS: Constrained Mixed Precision Search Soundness: 3: good Presentation: 3: good Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. This work proposes a new constrained mixed precision search for post training quantization. To solve the constrained optimization problem the authors leverage barrier-based interior-point method. The method keeps model weights frozen, and needs only a small calibration set (128 samples). Experiment on various LLMs report consistent gains over uniform precision baselines at the same or lower effective bit budgets. 1. The work discusses the problem of reducing computational and memory footprints for deployment of DNNs which is practical and important. 2. The paper is well written and easy to follow. 3. 4.5-bit the proposed method often beats MXFP in terms of perplexity, on the examined benchmarks. 1. The authors claim that after rounding there always remains a strictly feasible solution with respect to the budget. I believe a proof for this claim is required. 2. The comparison is limited. The work only compares itself to the MX baselines but there are many other strong PTQ techniques. Only a single dataset was used in the experiments. 3. No thoughput\latency comparisons are provided. 4. The improvement over the baselines is marginal. 5. How does the method operate compared to integer PTQ techniques? 6. According to the experiments, the proposed algorithm does not always meet the constraint. How were the samples for calibration chosen? What is the meaning of the upsidedown question mark in the caption of Figure 2? There is a typo in in line 75 (double "“Our contributions are as follows:") Fully human-written
CMPS: Constrained Mixed Precision Search Soundness: 2: fair Presentation: 2: fair Contribution: 1: poor Rating: 2: reject Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. A DNAS-based post-training mixed-precision quantization method (CMPS) is proposed. CMPS provides fine-grained control over model compression, enabling stable and predictable performance. The proposed CMPS method is compared with uniform quantization baselines, demonstrating the advantages of learnable mixed-precision bit allocation. 1. This paper works on the post-training mixed-precision quantization with controllable compression ratios. The problem studied is important and the motivation is clear. 2. The detailed theoretical analysis is provided. 1. Quantization details are missing. It seems that CMPS is a weight-only quantization method. However, the quantization details are not provided. 2. Optimization cost is not provided. The advantage of PTQ is its efficiency in quantization optimization. The CMPS relies on end-to-end tuning with multiple branches. The speed and memory cost overheads should be reported. 3. Comparison with previous methods is also missing. The authors didn't provide any quantization details, including the uniform quantization baselines. In the llm quantization literature, many high performance PTQ methods are proposed. What's the performance advantages over these methods? How the proposed CMPS can be combined with these techniques? Moreover, the authors only compared with uniform quantization baselines, the comparison with previous mixed-precision methods are missing. 4. In several places, it says "hardware-constrained bit allocation", however, only "total model size in bits" is modeled during the optimization. Moreover, only two bit levels are explored in the bit allocation (MXFP4 and MXPF8). 5. In the experiments part, previous methods commonly use wiki2 for calibration in addition to C4. For zero-shot scenario, only one task of LAMBADA is evaluated, which is clearly not enough. The largest model used is 3B, experiments on larger models or architectures like MoEs are also needed. 6. In the limitations, regarding the statement "the memory required to hold activations or gradients for multiple low-bit options might still be comparable to, or less than, holding a single higher-precision (e.g., FP16 or BF16) baseline tensor", more careful and precise expression should be used. Many PTQ methods do not need to store all activations, and the gradients are not needed. However, in CMPS, full-precision activations of all layers and gradients are needed, which expands the memory usage. If these tensors (activations and gradients) can be stored in low-bit, then the authors should verify it use controlled experiments. Please refer to the Weaknesses for further questions. Fully human-written
PreviousPage 1 of 1 (3 total rows)Next