ICLR 2026 - Reviews

SubmissionsReviews

Reviews

Summary Statistics

EditLens Prediction Count Avg Rating Avg Confidence Avg Length (chars)
Fully AI-generated 0 (0%) N/A N/A N/A
Heavily AI-edited 1 (25%) 6.00 4.00 2923
Moderately AI-edited 0 (0%) N/A N/A N/A
Lightly AI-edited 2 (50%) 3.00 2.50 2480
Fully human-written 1 (25%) 2.00 3.00 2751
Total 4 (100%) 3.50 3.00 2659
Title Ratings Review Text EditLens Prediction
The Price of Explainability for Kernel $k$-means Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. This paper investigates the price of explainability in kernel k-means clustering. The authors theoretically analyze explainability costs under linear, Gaussian, and Laplacian kernels with and without bijective or order-preserving points. Furthermore, the paper introduces DT2 , avoiding kernel approximations used in KIMM. 1. The paper introduces two concepts, Bijective Center and Order-Preserving Points, in the explainability analysis of kernel k-means, which are used to examine the mapping relationship between the RKHS and the original feature space. 2. The proposed Dual-Threshold Tree (DT2) algorithm is a direct improvement over KIMM. 1. The paper would benefit from a clearer articulation of its research motivation. Clarifying whether the primary aim is to enhance the performance or the explainability of kernel k-means would help readers better appreciate the contribution and positioning of the work. 2. The logical flow of the writing could be further refined. Introducing the concept of kernel k-means explainability before discussing the explainability cost, and explicitly connecting this analysis to the proposed Dual-Threshold Tree algorithm, would make the overall narrative more coherent and accessible. 3. The discussion on explainability itself could be further deepened. While the paper focuses on the price of explainability, it provides limited theoretical insight into what constitutes a “good” or “faithful” explanation. A more explicit exploration of how the proposed method enhances or measures explainability would be highly valuable and is something the community would look forward to seeing. 4. While the current experimental results show modest improvements, especially in Figure 5, the study presents a promising foundation. A deeper investigation into scenarios or datasets where the Dual-Threshold Tree can achieve more pronounced gains would strengthen the empirical impact of the paper. 5. The experimental evaluation could be expanded to better illustrate the interpretability advantages of the proposed method. Including visualizations of explanation results or quantitative comparisons (e.g., Rand index against ground truth) would enrich the assessment of explanation quality. 6. The contribution could be further broadened. Although the paper builds thoughtfully on “Explaining Kernel Clustering via Decision Trees,” extending the theoretical analysis beyond Gaussian and Laplacian kernels—such as to dot-product, polynomial, or other translation-invariant kernels—would make the findings more comprehensive and impactful. 1. The feature mapping of an RKHS is not always invertible. When the feature mapping of kernel k-means is non-invertible, how does the analysis based on Bijective Centers and Order-Preserving Points still apply? 2. For kernel functions without explicit feature mappings other than the Gaussian kernel, how can their Bijective Centers be identified? 3. Line 160: What are the specific expressions for cost(T, X) and minT(T, X)? 4. The experiments are conducted only on low-dimensional and small-scale datasets. Can DT2 be applied to high-dimensional features and large-scale datasets? Lightly AI-edited
The Price of Explainability for Kernel $k$-means Soundness: 3: good Presentation: 3: good Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. This paper investigates the Price of Explainability for kernel k-means clustering. It systematically analyzes the trade-off between explainability and clustering performance across different kernel functions, with a particular focus on the Gaussian kernel and Laplacian kernel. The authors propose a novel "Dual-Threshold Tree (DT²)" algorithm that enables the direct construction of interpretable clustering models in the original space without kernel approximation. Theoretical and experimental results demonstrate that kernel k-means based on the Laplacian kernel incurs a lower explainability cost, and the DT² algorithm outperforms the existing KIMM method on multiple datasets. The study further reveals the intrinsic connections between the Laplacian kernel and k-medians, as well as between the Gaussian kernel and k-means, thereby providing a new perspective for interpretable clustering. 1. Strong Innovation: It is the first study to systematically analyze the "Price of Explainability" of different kernel functions from both theoretical and experimental perspectives, and proposes the DT² algorithm to reduce interpretability loss. 2. Tight Integration of Theory and Experiment: The paper verifies findings through rigorous theorem derivation and experimental evidence, achieving high consistency between theoretical results and experimental outcomes. 3. High Method Practicality: The DT² algorithm eliminates the need for kernel approximation, enabling direct construction of interpretable models in the original space and demonstrating good scalability. 4. Novel Research Perspective: It reveals the connections between the Laplacian kernel and k-medians, as well as between the Gaussian kernel and k-means, expanding the theoretical framework for clustering interpretability. 1. Insufficient Discussion on Limitations: It only analyzes three types of kernels (linear, Gaussian, and Laplacian) and fails to conduct an in-depth exploration of other kernel functions (e.g., neural tangent kernel). 2. Overly Strong Theoretical Assumptions: Some theorems rely on ideal conditions, such as "the existence of an invertible center or order-preserving points," which may be difficult to satisfy in real-world data. 3. Limited Experimental Scale: The verification is only carried out on five small-to-medium-sized datasets, lacking tests in large-scale and high-dimensional scenarios. 1. Experiments only selected five small-scale datasets. Have you tested high-dimensional or real-scenario data (such as images or text), and do the results remain consistent? 2. The DT² algorithm constructs a dual-threshold tree in the original space to avoid kernel approximation. How does its time complexity compare to that of the KIMM method? 3. The theorem assumes the "existence of order-preserving points". Is this condition verifiable or approximately satisfiable in the distribution of actual high-dimensional data? Heavily AI-edited
The Price of Explainability for Kernel $k$-means Soundness: 1: poor Presentation: 2: fair Contribution: 1: poor Rating: 2: reject Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. The paper studies approximation guarantees for decision trees in clustering with kernel $k$-means. The guarantees are given in terms of the ratio between the clustering cost of a tree, and the optimal cost. This ratio is referred to as the price of explainability (POE). The paper begins by stating the POE for the linear kernel, which reduces the problem to $k$-means. Then, the paper introduces assumptions on the data distribution to derive bounds on the POE. Firstly, it is assumed that the cluster mean in the RKHS has a pre-image in the input space. Secondly, the authors introduce the notion of order-preserving points. For both, the POE is claimed to be $\mathcal{O}(k \ln \ln k)$ for the Gaussian kernel. Thirdly, if such points do not exist, the cost is claimed to depend on and the radius of the dataset. According results are derived for the Laplace kernel. Finally, the paper proposes a greedy cost-minimizing algorithm and evaluates it on a few datasets. Studying kernel clustering and its interpretability under assumptions on the distribution is interesting, as is the notion of order-preserving points. The paper lacks clarity and I have concerns regarding the theory. (1) The notion of an interpretable threshold tree is not formally defined. The paper probably takes the same notion of axis-aligned trees as in prior works, but this should be formalized. (2) For the Gaussian or Laplace kernel, there can never exist a pre-image for cluster means unless all points in a cluster are identical. Proof: Take any $(x_i)_{ i=1,\ldots,n}$. Then, with $\psi$ denoting the feature map, the mean in RKHS is $\mu = \frac{1}{n} \sum_{i=1}^n \psi(x_i)$. Therefore, $\|\mu\|^2 = \frac{1}{n^2} \sum_{i,j=1}^n \kappa(x_i, x_j) < 1$. However, for any $\rho$ in the input domain, $\| \psi(\rho)\| = \kappa(\rho,\rho) = 1$. Unfortunately, this makes a relevant part of the theory void. (3) For the case of order-preserving points, the proof of Theorem 3.6 is confusing (Appendix H). Essentially, the proof starts with $(I) = \sum_j \sum_x \| \psi(x) - \mu_j \|^2$, and upper bounds it by $(II) = c \cdot \mathcal{O}(k \ln \ln k) \cdot cost_{opt}$. Unless I am missing something, $(I)$ is already the cost of clustering w.r.t. the means in RKHS, not the cost of clustering with a tree. It is not apparent where in the proof a tree is analyzed. Furthermore, the term $\mathcal{O}(k \ln \ln k)$ appears in several derivations without justification. (4) The paper also mentions the cost of explainability. How does it give us insight into the price, as alluded to after Theorems 3.7 and 3.8? (5) It is not clear how order-preserving points would be found in practice, or what are natural conditions for them to exist. See questions in weaknesses Fully human-written
The Price of Explainability for Kernel $k$-means Soundness: 3: good Presentation: 3: good Contribution: 3: good Rating: 4: marginally below the acceptance threshold Confidence: 1: You are unable to assess this paper and have alerted the ACs to seek an opinion from different reviewers. The explainability of the machine learning model has received increasing attention recently for security and model reliability reasons. This paper studies explainable kernel clustering and compares the explainable performance of kernel k-means algorithms based on different kernels. In particular, this paper shows that kernel k-means clustering with the Laplacian kernel has a lower price of explainability than that with the Gaussian kernel. In addition, this paper proposes a new kernel k-means interpretability algorithm that directly constructs a dual-threshold tree in the original space to achieve interpretable kernel k-means, and experimentally shows that it outperforms KIMM, which constructs the threshold tree in the kernel space. 1. The structure of the paper is relatively clear, though the writing quality is slightly lacking. 2. The theoretical derivation and proof in the paper are quite thorough. 3. The interpretability of KKM represents a promising research direction with significant academic value. 1. There are punctuation errors and typos in the formula expressions. 2. The experimental section is excessively simplistic, and the datasets used are almost toy datasets with a tiny data scale. 3. Compared with KIMM, the performance or price improvement of the proposed method is very minimal and marginal. 4. I believed that the paper fails to adequately elaborate on how the interpretability of KKM should be explained and how it can guide kernel clustering algorithms. In terms of the paper’s contributions, the first one is not sufficient to be listed independently; the second merely involves calculations and derivations; the third is a simple algorithmic illustration. Overall, this paper significantly lacks innovation and is not qualified for publication in ICLR. As above. Lightly AI-edited
PreviousPage 1 of 1 (4 total rows)Next