ICLR 2026 - Reviews

Submissions Reviews

Reviews

EditLens Prediction: Fully AI-generated Heavily AI-edited Moderately AI-edited Lightly AI-edited Fully human-written All

Rating: 1 2 3 4 5 6 7 8 9 10 All

Confidence: 1 2 3 4 5 All

Summary Statistics

EditLens Prediction	Count	Avg Rating	Avg Confidence	Avg Length (chars)
Fully AI-generated	1 (25%)	4.00	3.00	3532
Heavily AI-edited	0 (0%)	N/A	N/A	N/A
Moderately AI-edited	0 (0%)	N/A	N/A	N/A
Lightly AI-edited	0 (0%)	N/A	N/A	N/A
Fully human-written	3 (75%)	4.67	3.33	2687
Total	4 (100%)	4.50	3.25	2898

Title	Ratings	Review Text	EditLens Prediction
Buckingham $\pi$-Invariant Test‑Time Projection for Robust PDE Surrogate Modeling	Soundness: 4: excellent Presentation: 3: good Contribution: 3: good Rating: 8: accept, good paper Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper focuses on mitigating out-of-distribution (OoD) inference of neural operators. The method is based on Buckingham $\pi$-theorem, which decomposes the entire spaces into two parts: the null space $\ker(\Phi^T)$ and the component perpendicular to $\ker(\Phi^T)$, where $\Phi=[\phi^{(1)}\cdots\phi^{(p-r)}]$ stacks the null-space bases vectors (see Theorem 1). Thus, data can be transformed or projected to a point that is the closest to training data while preserving $\pi$. Experiments demonstrate effectiveness on OoD data that is superior than baselines. Originality is high. The method leverages Buckingham $\pi$-theorem and addresses OoD problem innovatively from a structured perspective, i.e., data space can be decomposed into equivalence classes generated by training data $(X_i, Y_i)$. Clarity is good in general except for some minor issue (see weakness). Diagram and figures are illustrative and helpful. Regarding Clarity: 1. An algorithm that summarizes all the procedure can be helpful to readers. Especially, corresponding to predict & inverse in Fig. 3. How to do inverse in general? 2. Above equation (11), should $\tilde{X}^{\star}$ be $\tilde{X}^{*}$ as eq. (11)? 3. In Fig. 1, what do purple circles and yellow circles stand for? Reference: 1. How is your work related to Lie point symmetry [1]? Limitation: 1. Can your method be applied to irregular mesh grids? All baseline models in your paper, CNN, U-Net and FNO, can only be applied on uniform grids. Is there such limitation for your method? [1] Brandstetter, J., Welling, M., & Worrall, D. E. (2022, June). Lie point symmetry data augmentation for neural pde solvers. In International Conference on Machine Learning (pp. 2241-2256). PMLR. See weakness.	Fully human-written
Buckingham $\pi$-Invariant Test‑Time Projection for Robust PDE Surrogate Modeling	Soundness: 2: fair Presentation: 1: poor Contribution: 3: good Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.	This paper proposes the use of a test-time projection method in which samples are rescaled to match the scaling of the nearest point in the training data without changing the nondimensional parameters of the system through use of the Buckingham-\pi. The approach is made more efficient by the use of centroids rather than comparisons to the full training data. 1. It's a very sensible approach which tackles an important problem. 2. The experiments are informative and show really strong performance. 3. The authors' were clearly conscious of the impact of cost and found an approximation that works well. Overall, it seems the method is a natural and very promising approach for mitigating OOD performance, at least in the cases where the OOD is due to different dimensional choices could easily happen when applying a pretrained model on new data. While there are some unanswered questions and aspects that could be tightened up which I'll list further down, the main reason for my current recommendation is that the presentation could use significant improvement, particularly on writing to a machine learning audience where a many readers will have very little experience with dimensional analysis. I think this can be easily rectified with some restructuring and building out examples better. Here are some concrete issues and suggestions: 1. Many readers who work on neural surrogates are not going to be familiar with dimensional analysis (not arguing that this should be the case, but it is currently true), so when section 3 doesn't explain the basic concept well, subsequent sections will be harder to understand. If you expanded the worked example and using it through each stage of the section to describe what the fields are, what the units are, and then go through the current example showing how to extract $\pi_{th}$ from them would make the statements significantly more concrete and easier to follow. 2. It feels like concepts are often not explained in the place where they are presented. One would expect all of the methods to be explained in section 4 either mathematically or algorithmically, but section 4.4 just explains the uniform strategy as "tunes the dominant scale while others fixed". If this is supposed to be a method that doesn't require users to perform dimensional analysis on their own, how should they determine the dominant scale? How does this make the distribution uniform? 3. Experiments are just equations right now. What types of common applications do these equations represent and why are they interesting test beds? How did you generate the data sets? How are initial conditions generated? Other issues: 1. Experiments are currently fairly weak. These are two linear problems. It would be more interesting to see if the advantages from a linear projection method also holds for nonlinear problems. 2. What are the test and train distribution of the invariants? For instance, it seems like q and k are shifted in the same pattern which wouldn't necessarily result in disjoint equations. It would be good to highlight where the method would be expected to fail as well. Minor: 1. B not defined in Theorem 1. 2. Random selection is mentioned, but not included in table 1. 1. Often, these surrogate methods you're describing are trained on simulation data which is already non-dimensionalized. How would you expect the performance of this method to be affected in this setting where new data is likely using the same characteristic scales? Given the is already dimensionally equivalent, will performance be affected? 2. What's the justification for using the mean in place of characteristic scales? Often in fluids, these scales are relevant to the geometry of some feature in the system. Is this something that can be ignored in the current setting? 3. Could you provide more detail on how the datasets were generated?	Fully human-written
Buckingham $\pi$-Invariant Test‑Time Projection for Robust PDE Surrogate Modeling	Soundness: 3: good Presentation: 2: fair Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.	This paper proposes a Buckingham π-Invariant Test-Time Projection method to improve the out-of-distribution (OOD) robustness of PDE surrogate models such as FNOs, U-Nets, and CNNs. The key idea is that many OOD inputs differ only by unit or scale changes that should be physically equivalent under dimensional analysis. The authors therefore apply the Buckingham π theorem to define a log-space pi-group-preserving projection while aligning each test sample with its nearest training sample in parameter space. They proposed a minimization procedure for this alignment. Experiments on steady 2-D thermal conduction and linear elasticity show up to 90% reduction in OOD error with minimal computational overhead. - The paper adapts a century-old but fundamental physical principle (Buckingham π) into a practical, algorithmic test-time procedure for neural surrogates. - The combination of π-compliant projection and nearest-sample search is a creative way to transfer dimensional invariance into modern ML practice. - Training-free and model-agnostic: it can wrap around any pretrained surrogate without re-training or altering the loss, which makes it attractive for applied modeling. - Smooth minimization: the log-space formulation converts multiplicative scaling into a linear subspace problem, solved neatly by an orthogonal projection. - The idea may stimulate a broader discussion on how physical similarity and scale invariance can be enforced at inference rather than training time. - Limited scope of experiments. Only simple, steady, linear PDEs (conduction and elasticity) are tested. It remains unclear whether the method holds for nonlinear, transient, or multi-physics systems. - Procedure feels overly elaborate for a scaling correction: The projection, clustering, and least-squares steps may appear heavy compared to straightforward normalization or nondimensionalization. There is a lack of exploration of when the procedure was worth, and when it is simply more favorable to cast more training data points. - Writing and exposition are often opaque: Key transitions between the physical reasoning, log-space math, and algorithmic steps are difficult to follow without prior familiarity. - No guarantee of a true physical neighbor: The nearest-sample search may project the test case toward an unrelated training sample if the π-space distribution is sparse or multimodal. - No discussion on mis-specified π-groups: The procedure assumes the chosen invariant is the correct one; the effect of using incomplete or incorrect π-groups is not analyzed. - Use of mean field values may fail for heterogeneous inputs: Collapsing distributed fields into global means ignores spatial structure, which can distort π values for systems with strong local variability. - How sensitive is the method to the choice of π-group? Could the projection degrade performance if irrelevant or redundant groups are used? In many problems the pi groups are actually ratios between problem geometries, and it is unclear how they are to be chosen. - For heterogeneous domains, can local or hierarchical π values be used instead of global means? - How would the method behave on nonlinear or transient PDEs (e.g., Navier–Stokes, Burgers’, advection–diffusion)? Since pi scaling is used extensively in computational fluid mechanics, there should be examples from CFD for well-known pi groups - What is the computational cost for large training sets before centroid reduction, and how stable are results with different cluster seeds?	Fully AI-generated
Buckingham $\pi$-Invariant Test‑Time Projection for Robust PDE Surrogate Modeling	Soundness: 3: good Presentation: 2: fair Contribution: 2: fair Rating: 2: reject Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.	The paper proposes a projection method, called \pi-invariant test-time projection, which can cluster training and test data points into to dimensionless groups. Such group is invariant to the unit scales, and can help separate data points with different physical behaviors. According to the paper, this can help reduce the prediction performance for OOD samples. 1. The introduction of Buckingham \pi-invariant is interesting and new to the CS/ML community --- though this might be an old concept in computational physics/applied math. 2. The idea of clustering data points according to different behaviors without being influenced by units/scale changes is interesting and has pontential to enhance the generalization of current ML surrogate models. 1. Clarity. This might be the biggest issue --- the paper does not explains clearly how the proposed projection method is integrated into the training/testing pipeline of neural operators to enhance OOD prediction. Throughout, the paper focuses on how to do projection and clustering. but what to do with NO training/testing? Will you train a different NO for each cluster, and then use the test-time projection for each test example to dynamically determine which NO should be used to predict? Or will you compute a soft cluster membership of each cluster, and then perform a mixture of predictions? Intuitively, there can be many ways of integrating the proposed method with NO training/testing or even data preparation/acquisition. However, the part is significantly lacking and it is hard to understand how the improvement in experimental part is obtained. 2. The OOD problem mentioned in this paper is a bit different from commonly used settings. Regarding OOD, we will first change the distribution of the input to neural operators, rather than assume fundamental change of physical behaviors. In fact, an ML surrogate (e.g., NO) is typically used to capture one type of physical behaviors under various scenarios. I am not sure if expanding the scope to make a surrogate model account for several different physics is appropriate or feasible. At least, the paper should clarify its own meaning of OOD, difference/connection with settings in prior works. 3. The experimental results are limited. Only on two systems is not sufficient in this community. Also, there is no standard deviation in Table 1, making it hard to conclude the significance of improvement. see above	Fully human-written

PreviousPage 1 of 1 (4 total rows)Next