ICLR 2026 - Reviews

Submissions Reviews

Reviews

EditLens Prediction: Fully AI-generated Heavily AI-edited Moderately AI-edited Lightly AI-edited Fully human-written All

Rating: 1 2 3 4 5 6 7 8 9 10 All

Confidence: 1 2 3 4 5 All

Summary Statistics

EditLens Prediction	Count	Avg Rating	Avg Confidence	Avg Length (chars)
Fully AI-generated	0 (0%)	N/A	N/A	N/A
Heavily AI-edited	0 (0%)	N/A	N/A	N/A
Moderately AI-edited	0 (0%)	N/A	N/A	N/A
Lightly AI-edited	4 (100%)	4.00	3.50	3029
Fully human-written	0 (0%)	N/A	N/A	N/A
Total	4 (100%)	4.00	3.50	3029

Title	Ratings	Review Text	EditLens Prediction
SimPlex-GT: A Simple Node-to-Cluster Graph Transformer for synergizing homophily and heterophily in Complex Graphs	Soundness: 3: good Presentation: 2: fair Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	SimPlex-GT is Graph Transformer designed to synergize homophilic and heterophilic patterns in complex graphs. It integrates local GCN-based message passing with a sparse node-to-cluster (N2C) attention mechanism, fuses local/global features via complementary filtering or cluster smoothing, and adopts a self-supervised teacher–student framework with dynamic masking. Evaluated on 11 benchmarks, it achieves state-of-the-art performance on heterophilic graphs, remains competitive on homophilic ones, and offers superior computational efficiency. 1.Effectively handles both homophily (via GCN) and heterophily (via N2C attention) without specializing in either. 2.N2C attention reduces complexity from O(N^2) to near-linear, enabling scalability for large graphs. 3.Outperforms existing GNNs and GTs on heterophilic datasets (e.g., Texas, Chameleon) while matching top results on homophilic ones (e.g., Cora, PubMed). 1.The performance of the proposed model is slightly sensitive to the number of clusters. How should this parameter be tuned for optimal results? 2.From the experimental results, most Graph Transformers perform worse than traditional GNNs. Does this suggest that designing on GTs is not meaningful? Meanwhile, traditional GNNs already have extensive work addressing both homophilic and heterophilic graphs. This gives the impression that improving GTs for handling homophily and heterophily may be unnecessary. 3.Figure 2 does not seem to convey much information. 4. In Equation 14, what does LN represent? Using $\mathcal{G}$ to denote the representation here can be misleading. See weaknesses.	Lightly AI-edited
SimPlex-GT: A Simple Node-to-Cluster Graph Transformer for synergizing homophily and heterophily in Complex Graphs	Soundness: 3: good Presentation: 2: fair Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.	Traditional Graph Neural Networks (GNNs) struggle to handle graph structures under mixed patterns due to their inherent smoothing operations. To address this, the paper proposes SimPlex-GT, a novel graph transformer model. This model utilizes traditional local Graph Convolutional Networks (GCNs) for message passing to tackle homogeneity issues. Simultaneously, it designs complementary filtering and clustering smoothing mechanisms, and on this basis, constructs enhanced node attention mechanisms as well as global node-to-cluster attention mechanisms to deal with heterogeneity problems. SimPlex-GT can effectively process complex structural patterns. In terms of training methods, SimPlex-GT adopts a self-supervised learning framework, with masked node modeling serving as the primary proxy objective, and employs a teacher-student prediction architecture. Additionally, it introduces a node-difficulty-driven dynamic masking strategy. This strategy can adaptively adjust the masking process, enabling the model to learn more robust and information-rich representations. Through comprehensive theoretical analysis and outstanding empirical performance in terms of efficiency, the authors demonstrate that SimPlex-GT maintains a high level of competitiveness on homogeneous graphs in benchmark datasets, while also improving memory and training efficiency. Overall, this article demonstrates strong innovation by proposing novel designs: complementary filtering and clustering smoothing. In complementary filtering, low-frequency information serves as the prototype, while high-frequency signals act as effective queries. By introducing graph structure awareness into the input (achieved through complementary filtering) or output (accomplished through clustering smoothing) of the node-to-cluster module, this design leverages the optimal characteristics of the two proposed models to collaboratively handle homogeneity and heterogeneity issues. The local feature model focuses on neighborhood similarity, whereas the global feature model centers on dynamically learned clustering prototypes. A learnable gating mechanism integrates these complementary perspectives, and orthogonality constraints encourage diversity in representations. This design utilizes the collaborative modeling capabilities of the two branches to generate more robust node representations. It adopts masked node modeling as the primary proxy objective and employs a teacher-student prediction architecture. Additionally, the paper balances task difficulty and data diversity by designing difficulty levels, maintaining a base masking rate, and preventing biased sampling. When designing the two models to synergize homogeneity and heterogeneity, the authors took model simplicity as the starting point and opted for single-layer Graph Convolutional Networks (GCNs). Occasionally, the authors also mentioned that stacking multiple modules could enhance the model's performance. Therefore, it seems that improving model performance was not the primary consideration in model construction. Moreover, some variables or operations in the formulas, such as those in the gating mechanism, do not appear to have detailed explanations regarding their specific solutions or implementations. 1）The paper employs single-layer GCNs in multiple places. For instance, 1) When designing complementary filtering, the author divides nodes into two complementary channels through a low-pass filter, which is implemented and approximated by a single-layer GCN. Can such a simple single-layer filter achieve the desired effect? 2） When using the graph structure to address homogeneity issues, a single-layer GCN branch is introduced as a residual path to capture this information and is integrated into the heterogeneous target N2C output. Can this single-layer GCN effectively capture the information? Is the model too simplistic? 2）In lines 290 - 294, the author states, "Note that, similar to other baselines (Rampášek et al., 2022), above designs can be viewed as a building block in our framework, and multiple blocks can be easily stacked to enhance the model’s expressive power. In our experiments, we retain a single block for simplicity." The author mentions that multiple blocks can be easily stacked to enhance the model's expressive power. Is it inappropriate to forgo enhancing the model's expressive power for the sake of simplicity? 3）The author mentions in the node-to-cluster (N2C) attention model the concepts of meaningful attention joint clustering and compact attention joint clustering. How can we determine whether an attention is meaningful and compact? 4）The teacher's output S(v;φ) and the student's output T(v;ψ) are not provided in the paper. 5）The author refers to it as "smart cosine similarity," but from the formula, it just seems to be an ordinary cosine similarity function with λorth as a balancing parameter. Where does the "smart" aspect come into play? 6）The paper repeatedly mentions that orthogonality regularization promotes representational diversity. For example, an auxiliary orthogonality regularization term is introduced to encourage the two branches to learn mutually enhancing features. Specifically, which entities are subject to the orthogonality constraints? 7）How is the gating mechanism implemented?	Lightly AI-edited
SimPlex-GT: A Simple Node-to-Cluster Graph Transformer for synergizing homophily and heterophily in Complex Graphs	Soundness: 3: good Presentation: 3: good Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.	This paper presents a Graph Transformer architecture called SimPlex-GT, which aims to provide a unified representation learning framework for complex graphs with both homophilic and heterophilic properties. The core contributions of the paper are threefold: 1) A scalable, linear complexity "Node-to-Cluster (N2C)" attention mechanism that approximates global attention by allowing nodes to focus on a set of dynamically learned cluster prototypes; 2) Two theoretically motivated synergy mechanisms (Complementary Filtering (CF) and Cluster Smoothing (CS)) to fuse local information from the GCN branch with global information from N2C attention; 3) A novel self-supervised learning paradigm that employs a difficulty-driven dynamic masking strategy under a teacher-student framework. The authors demonstrate the effectiveness and efficiency of this approach through theoretical analysis and extensive experiments on 11 benchmark datasets. 1.Overall, the presentations are clear and easy to understand their framework and results. 2.For each core design (N2C, CF, CS), the paper provides corresponding theoretical support (Theorems 1-4), increasing the credibility of the method and clearly explaining its underlying working principles (e.g., the variance reduction property of N2C). 3.The experimental section is well-structured, with comprehensive benchmarking on multiple datasets. The detailed ablation studies (Tables 4, 5, 6, 7, 8) systematically validate the necessity and effectiveness of each component, making the experimental conclusions reliable. 1.The SOTA performance achieved by the model is the result of both the novel architecture (N2C+CF/CS) and its powerful self-supervised training strategy (dynamic masking). The ablation study (Table 5) does not address the collinearity between the training strategy and N2C in terms of their impact on performance. 2.In the methods section, the authors position the SimPlex-GT module as a general building block that can be "easily stacked to enhance model expressiveness" (Page 6, Lines 289-291). However, in the experimental section, they explicitly state, "For simplicity, we only kept a single module." Given the well-known depth bottleneck in the Graph Transformer field (i.e., stacking multiple layers can lead to performance degradation), validating this claim in a single-layer setup weakens the experimental support for the "stackability" assertion. 3.One of the motivations for introducing N2C attention is the potential use of "hierarchical node structures" within the graph (Page 3, Lines 160-161). This is a strong entry point, but in the subsequent theoretical and experimental analysis, the authors do not revisit this idea. The theoretical analysis primarily focuses on variance reduction rather than hierarchical representation capability (Theorems 2 and 4). As a result, the initial motivation does not fully close the loop in the final analysis. See weakness	Lightly AI-edited
SimPlex-GT: A Simple Node-to-Cluster Graph Transformer for synergizing homophily and heterophily in Complex Graphs	Soundness: 1: poor Presentation: 2: fair Contribution: 2: fair Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper proposes SimPlex-GT, a simple yet efficient graph Transformer framework that effectively synergizes homogeneous and heterogeneous structures in graphs by combining local GNN message passing with global node-cluster attention mechanisms. Trained under a self-supervised teacher-student framework and employing a dynamic masking strategy to focus on challenging nodes, the model achieves efficient, stable, and state-of-the-art node representation learning across diverse graph datasets. The paper proposes a graph transformer model that attempts to address the challenges of heterogeneous graphs and reduce time complexity. Experimental results indicate that this approach appears to be effective. The paper presents numerous theorems, but many of them are problematic. 1. The theoretical analysis in the paper primarily relies on an overly strong assumption: that node features can be directly aligned with labels. Introducing this assumption appears to detach the entire problem from its graph-based nature. Under this assumption, logistic regression becomes sufficient to address these issues. 2. Certain assumptions were not included in the theorem statement but were utilized in the proof. This renders the theorem either over-claimed or incorrect. For example, in the proof of Theorem 1, the author assumes that the activation of the GNN is linear and the weights are the identity matrix (Line 761), yet this assumption is not stated in the theorem itself. By the way, this assumption will make a GNN even weaker than a logistic regression. 3. In practice, GNNs can achieve classification performance on heterogeneous graphs that surpasses the bound proposed by the authors, thereby reducing the validity of their theorem. 4. Theorem 1 analyzes what appears to be GCNs rather than GNNs. Many simple GNNs can achieve performance comparable to ground truth under this theoretical framework, such as GAT. Minors: 1. This method does not appear to be applicable to graphs that cannot be clustered. 2. No code was provided for reviewers to examine. Given that the theoretical section constitutes a significant portion of the paper and the potential issues it may contain, I believe this paper is not ready for publication. See weakness.	Lightly AI-edited

PreviousPage 1 of 1 (4 total rows)Next