ICLR 2026 - Reviews

SubmissionsReviews

Reviews

Summary Statistics

EditLens Prediction Count Avg Rating Avg Confidence Avg Length (chars)
Fully AI-generated 1 (25%) 2.00 4.00 3626
Heavily AI-edited 1 (25%) 4.00 3.00 1742
Moderately AI-edited 0 (0%) N/A N/A N/A
Lightly AI-edited 1 (25%) 8.00 3.00 1711
Fully human-written 1 (25%) 8.00 5.00 2482
Total 4 (100%) 5.50 3.75 2390
Title Ratings Review Text EditLens Prediction
Rapid Training of Hamiltonian Graph Networks Using Random Features Soundness: 3: good Presentation: 3: good Contribution: 2: fair Rating: 8: accept, good paper Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. This paper proposes a novel iterative method to train Hamiltonian Graph Neural Networks (GNNs) orders of magnitude faster than standard optimizers. The approach leverages random feature sampling over the small shared node, edge, and message MLPs of the GNN, followed by a least-squares fit to determine the final linear layer that outputs the Hamiltonian. To ensure translation and rotation invariance, the particle positions are processed by aligning the simulation frame of reference to the center of mass and rotating the coordinates to a fixed orthonormal basis. The proposed training algorithm is evaluated on four different graph configurations and compared against multiple standard optimizers, achieving at least two orders of magnitude speedup (even outperforming the second-order L-BFGS optimizer). Furthermore, the method is benchmarked against seven state-of-the-art structure-preserving architectures using publicly available datasets. * The method is tested over Neurips 2022 open source benchmarks. * The paper proposes an interesting way to enforce rotation-invariances in the system. * The experiments are varied, exploring N-body systems (chains, regular grids) and molecular interactions in Lennard-Jones systems * The generalization and rollout results are good. These are the precise experiments needed to show the power of structure-preservation in learning conservative dynamics. * Even though the method is tested over a wide variety of systems, they are still small scale toy problems. * Section 4.3: Is there a particular reason why coosing Hamiltonian GNNs instead of, say, Lagrangian GNNs? Have the authors tried to train the tested architectures (GNODE, LGNN, FGNN, etc) with the presented method to see which architecture performs better? * Have the authors tried to use simpler invariance tricks, like using relative distances or relative angles? * Line 1079: $ 5^{-3}$ might refer to $5\cdot 10^{-3}$? * Line 1079: Here is specified that the molecular dynamics experiments are only rolled-out to 50 timesteps. However, Table 2 shows T=99999 timesteps. Which is the correct one? 50 timesteps is very few time horizon for a molecular system, given that the timestep is very small for stability. * Equation 6: Is the least squares minimization well-posed? Have the authors found any problems when training? I'm thinking about a bad conditioning number for the normal equations $Z^TZ$, or some degeneracies induced by the rotation-translation invariance. Fully human-written
Rapid Training of Hamiltonian Graph Networks Using Random Features Soundness: 3: good Presentation: 3: good Contribution: 4: excellent Rating: 8: accept, good paper Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This paper introduces RF-HGN, a method for training Hamiltonian Graph Networks using random feature sampling instead of iterative gradient-descent optimization. The authors demonstrate significant training speedups compared to other optimizers (Adam, LBFGS, etc.) while maintaining competitive accuracy on mass-spring and molecular dynamics systems. The proposed method offers a significant speedup without sacrificing accuracy in multiple systems. - comprehensive benchmarking - elegant solution by merging Hamiltonian Graph Networks (for physics-informed modeling), Random Features (for fast, non-iterative training), and careful construction of physical invariances (translation, rotation, permutation). - generalizability from small to larger systems - the paper could give more background on why random features work - acknowledge the limitation more prominently and discuss it - the acceleration is significant, but the claimed 600× speed-up is overstated; compared to the second-best model, it is actually 150×. - The method is presented in the context of Hamiltonian systems. How readily can it be applied to other physics-informed graph networks, such as Lagrangian or Port-Hamiltonian networks, or even non-conservative systems? Does the "random features + linear solve" recipe generalize? - The method demonstrates impressive zero-shot generalization to larger systems, but does this generalization hold for structurally heterogeneous systems? For instance, if a model is trained on a regular lattice (where all nodes have the same degree), can it accurately predict the dynamics for a system with a mix of node degrees, or for a node with an unexpectedly high degree not seen during training? Lightly AI-edited
Rapid Training of Hamiltonian Graph Networks Using Random Features Soundness: 2: fair Presentation: 3: good Contribution: 2: fair Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. This paper proposes RF-HGN, a method to accelerate training of Hamiltonian Graph Networks by replacing gradient-based optimization with random feature sampling and least-squares solvers. The approach samples dense layer parameters randomly (using ELM or SWIM) and optimizes only the final linear layer, achieving 100-600× speedups while maintaining comparable accuracy. The method demonstrates zero-shot generalization from small training systems (8 nodes) to large test systems (4096 nodes) across mass-spring, lattice, and molecular dynamics systems. 1. Dramatic Practical Speedups: The 100-600× training acceleration is genuinely impressive and could enable new applications in physics simulation. 2. Strong Zero-Shot Generalization: Training on 8-node systems and testing on 4096-node systems demonstrates remarkable scalability - this is perhaps the paper's most valuable contribution. 3. Comprehensive Experimental Validation: - Comparison against 15 different optimizers provides robust baselines - Multiple physical systems (springs, lattices, molecular dynamics) - Use of established NeurIPS 2022 benchmark dataset 4. Physical Consistency: The method maintains energy conservation and incorporates essential symmetries (translation, rotation, permutation invariance). 5. Clear Algorithmic Contribution: The two-stage training procedure (random sampling + least squares) is well-defined and reproducible. 1. Weak Theoretical Foundation: - No convergence guarantees or approximation bounds - Limited analysis of when/why the method works - Missing connection to random feature theory for this specific setting 2. Poorly Motivated Architectural Choices: - No compelling justification for choosing HNNs over alternatives (Lagrangian, Port-Hamiltonian, etc.) - Graph networks not well-motivated for many test systems (regular lattices better suited for CNNs) - Missing literature review of Hamilton graph neural networks 3. Limited System Complexity: - Mostly simple spring-mass systems and basic molecular dynamics - ~10% relative error on Lennard-Jones systems suggests limitations for complex potentials - No testing on truly challenging physics (e.g., turbulence, phase transitions) 4. Accuracy Trade-offs Not Well Characterized: - Sometimes less accurate than second-order methods (LBFGS) - Large variance in results (Table 1) raises questions about reliability - No principled way to predict accuracy vs. speed trade-offs 5. Scalability Questions: - Memory complexity O(MNe) may become prohibitive for very large systems - Linear solver bottleneck O(Kd²L) not thoroughly analyzed - Integration constant handling appears ad-hoc 6. Missing Key Comparisons: - No comparison with other physics-informed ML acceleration techniques - No baseline comparisons with structure-specific alternatives (CNNs for lattices) - Limited comparison with other random feature applications to physics 1. Can you provide convergence guarantees or approximation bounds for the random feature approach in the physics-informed setting? 2. Why specifically Hamiltonian neural networks? How does performance compare when applying random features to Lagrangian or other physics-informed architectures? 3.Can you provide principled guidelines for when accuracy degradation becomes significant? What physical properties are most affected? 4. What are the practical limits of your approach? At what system size does the linear solver become prohibitive? 5. How does the method perform on more challenging physical systems beyond simple spring-mass dynamics? Fully AI-generated
Rapid Training of Hamiltonian Graph Networks Using Random Features Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. The paper introduces Random-Feature Hamiltonian Graph Networks, whose hidden layers are fixed random features, with only the final linear readout solved in one least-squares step instead of iterative gradient descent. The model encodes translation/rotation/permutation invariances, reports 100-600x training time speedups over many optimizers on mass-spring and Lennard-Jones systems, and shows zero-shot scaling from tiny to very large graphs without retraining. 1. Replacing long iterative GD with a single convex least-squares solve is simple and yields substantial speedups without complicated tuning. 2. The model encodes translation, rotation, and permutation invariances appropriate for N-body dynamics, which improves data efficiency and generalization within a family. 3. Demonstrates training on small systems and inference on much larger ones without retraining. 1. The core idea is somehow similar to reservoir computing: both avoid training the feature generation module and optimize only the final readout layer. 2. With a single message passing stage and a linear head, it’s unclear how the method scales to long range or multi scale interactions. There is no ablation on stacking multiple RF blocks. 3. The training formulation assumes no external forces and exact energy conservation. Robustness to mild non-conservation (damping, stochasticity) is only lightly tested. 4. Transfer across distinct topologies is under-explored. 1. What are the key differences between your approach and reservoir computing? 2. What happens with 2-3 random feature message passing blocks versus a single block? How does performance scale with feature width, and do you observe conditioning issues in the least-squares solve as capacity grows? Heavily AI-edited
PreviousPage 1 of 1 (4 total rows)Next