ICLR 2026 - Reviews

Submissions Reviews

Reviews

EditLens Prediction: Fully AI-generated Heavily AI-edited Moderately AI-edited Lightly AI-edited Fully human-written All

Rating: 1 2 3 4 5 6 7 8 9 10 All

Confidence: 1 2 3 4 5 All

Summary Statistics

EditLens Prediction	Count	Avg Rating	Avg Confidence	Avg Length (chars)
Fully AI-generated	0 (0%)	N/A	N/A	N/A
Heavily AI-edited	0 (0%)	N/A	N/A	N/A
Moderately AI-edited	0 (0%)	N/A	N/A	N/A
Lightly AI-edited	0 (0%)	N/A	N/A	N/A
Fully human-written	3 (100%)	6.67	3.00	3061
Total	3 (100%)	6.67	3.00	3061

Title	Ratings	Review Text	EditLens Prediction
LRIM: a Physics-Based Benchmark for Provably Evaluating Long-Range Capabilities in Graph Learning	Soundness: 3: good Presentation: 3: good Contribution: 2: fair Rating: 6: marginally above the acceptance threshold Confidence: 2: You are willing to defend your assessment, but it is quite likely that you did not understand the central parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.	The paper constructs a graph regression dataset based on the well-known ising model as a dataset for long-range dependencies. They show that indeed long-range interactions are needed in order to solve the problem and that deeper networks tend to work a lot better than shallow ones. They also provide an oracle value for $k$-hop GNNs which is a lot better than the trained networks, indicating sufficient complexity to be challenging. The main strength of the suggested dataset is that it addresses one of the key problems in graph machine learning: long-range interactions. Existing datasets are often purely empirical (maybe except for the road networks from Liang et al 2025) instead of being principled. The dataset satisfies a number of desirable properties such as coming in varying complexities and sizes (including large graphs with 65k nodes each) while leaving a significant performance gap between a simple restricted oracle and existing GNNs. Overall, I am not yet convinced that the dataset is really what we are looking for, mostly because the graphs are extremely simple and as far as I understood the task, it is not that much about interactions influencing spin patterns, but rather on aggregating information from far away but in a way that is mostly independent of information that has already been digested. Concretely, there are a few aspects about the dataset that I consider not too strong: - it could have been a lot more clear how exactly the LRIM graphs are generated based on the background that has been described before, especially for graph-learning experts that have not worked with the ising model before. Apart from that, the paper is well-written and easy to follow. - The graphs are extremely simple (just lattices, even simpler than the road networks from Liang et al). Thus the task is really an oversimplified edge case for graph learning. - There is not that much interaction going on, especially since on a regular lattice all $J_{ij}$ are the same. - I did not get convinced that the construction really tests interaction and not just global aggregation (see questions). - The provided lower bound states that there exists a solution thats "very different", but says nothing about the distribution of such solutions. In particular I believe its possible to "cheat" using global statistics. Concrete (small) things: - since LRIM uses lattice graphs, positional embeddings should make looking at the edges irrelevant. (e.g. using a PE that is made for images and is able to encode an x and y position) - The task is really about very exact computations which tends to be not the strongest suite of machine learning models. And at some point floating-point precision will start becoming problematic (probably way before -20 where the trivial accuracy boundary is). - in the experiments it looked to me that a lot larger MPGNNs would have been possible without exceeding the computational demands of the tested graph transformers. Is there a reason why only "small" models have been tested? How does a 50M GatedGCN model perform? - 422: Maybe I was misreading the plot, but the numbers of oracle and learned method are not far apart for up to 12 layers (Fig 4). I do not agree with the conclusion made here. - 475: As soon as we know the distances, the graph itself becomes highly unimportant. So it is only partially about graph learning, i'd say. 1. is it really "interaction" or rather "aggregation" that is happening in the ising model? Especially when it is about energy prediction as in LRIM? 2. in the monte-carlo simulation that is used to simulate the system, I do not really understand how deterministic this is and how exactly it is used for LRIM. 3. when going for more complex graphs 4. How do you rate the possibility of cheating for a model based on e.g. global statistics and thus outperforming the oracle which has limited information (but uses that information optimally). And in that context, how helpful is the provided lower bound? 5. Do you have an intuition why LapPE did not perform at all? And how it happened to have this odd curve in Fig 4?	Fully human-written
LRIM: a Physics-Based Benchmark for Provably Evaluating Long-Range Capabilities in Graph Learning	Soundness: 3: good Presentation: 3: good Contribution: 2: fair Rating: 6: marginally above the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.	This paper introduces a physics based dataset to measure long range modeling capabilities in graph neural networks, named as Long-Range Ising Model (LRIM) Graph Benchmark. The benchmark utilizes the Ising model with power law interactions, where the target task provably depends on long-range dependencies. The paper provides 10 datasets ranging from 256 to 65k nodes with controllable difficulty through tunable parameter that controls the interaction strength between nodes inversely. Analysis shows that local information is insufficient, theoretical study is given on long rangeness measures, and empirical evaluations demonstrate that both message passing architectures and graph transformers perform lower. The entire dataset is synthetically generated and the graphs are 4-regular and 2D grid like. - For the datasets, the use of the Ising model provides a physics based foundation where long range dependencies are mathematically guaranteed and controllable. This is unlike some prior long range benchmarks in graphs such as superpixels where the long range is not mathematically guaranteed. - Compared to previous benchmarks which demonstrate long rangeness of tasks using performance of different model classes, this work has elaborate analysis of the proposed dataset with oracle predictor, theoretical lower bounds and long rangeness metric. - The task difficulty can be tuned and is also demonstrated with examples in Figure 3. In addition, there are clear performance gaps between message passing networks and full-neighborhood graph transformers, as in Tables 2,3. - The proposed collection of datasets with sizes and difficulty can be used for developing long range graph networks, alongside other recent works/datasets which study this topic. - As acknowledged by the paper, the benchmark is limited to regular lattice structures. This is significant since real world graphs rarely have such regular topology and and message passing GNNs may not be the best architecture; the grid structure may favor certain architectural choices. - In addition, methods designed specifically for grid-like data are excluded. however, including them could inform on the necessity of graph-specific networks perform in such settings. - A major limitation is the real-world applicability of the datasets which the paper acknowledges. One observation, GPS shows OOM on LRIM-256 in Table 3. Is there a possibility to include a more approximate alternative for GPS, for instance, specifically to inform the missing scores here?	Fully human-written
LRIM: a Physics-Based Benchmark for Provably Evaluating Long-Range Capabilities in Graph Learning	Soundness: 4: excellent Presentation: 3: good Contribution: 3: good Rating: 8: accept, good paper Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	The paper introduces LRIM, a physics-based benchmark built on the Ising model that provably depends on long-range interactions, addressing gaps in existing graph-learning benchmarks. Concretely, it proposes a node-regression task to predict per-node energy changes on 2D grid graphs that model different spin configurations of an Ising model. A theoretical analysis shows how the dependence on long-range patterns can be directly controlled in this setting. The provided empirical results report that graphs transformers do significantly outperform local MPNNs in this setting, although at a significantly increased computational cost. I think the suggested task is a valuable addition to the existing set of graph learning benchmarks to study long-range information in a controlled setting. In particular: 1. Obtaining a provably long-range graph learning task from the Ising model is an original idea and addresses the main problem of prior "long-range" benchmarks for which the justification of long-rangedness was purely empirical. 2. The reported results do show a clear separation between local and global architectures. 4. The large graph sizes of up to 65k nodes are challenging for standard graph transformers and seem like a good test bed for developing more efficient long-range architectures. 3. The presentation is clear and key details like hyperparameter budgets are fully provided. The restrictions to only using regular 2D grids is a weakness in the context of graph learning, as the main feature of GNNs is their ability to process arbitrary graph structures. I think this is an acceptable weakness for a benchmark that intends to be complementary to "real-world" datasets, but a broader range of graph structures would ultimately be more convincing. The set of provided baselines also misses MPNNs with virtual nodes [1] as a standard trick to propagate global information in graphs. It would be very interesting how such architectures perform on this dataset, as VNs allow for global information aggregation but lack the pairwise global interactions of transformers that seem to align well with the suggested task. [1] Gilmer, Justin, et al. "Neural message passing for quantum chemistry." International conference on machine learning. Pmlr, 2017. 1. What is the numerical range of the regression target $\Delta E_i$? Do these need to be normalized for training? 2. Given that the graphs are currently regular 2D grids, would it be reasonable to use the same task for benchmarking vision models like CNNs or Vision Transformers?	Fully human-written

PreviousPage 1 of 1 (3 total rows)Next