ICLR 2026 - Reviews

Submissions Reviews

Reviews

EditLens Prediction: Fully AI-generated Heavily AI-edited Moderately AI-edited Lightly AI-edited Fully human-written All

Rating: 1 2 3 4 5 6 7 8 9 10 All

Confidence: 1 2 3 4 5 All

Summary Statistics

EditLens Prediction	Count	Avg Rating	Avg Confidence	Avg Length (chars)
Fully AI-generated	1 (25%)	2.00	4.00	3552
Heavily AI-edited	0 (0%)	N/A	N/A	N/A
Moderately AI-edited	0 (0%)	N/A	N/A	N/A
Lightly AI-edited	3 (75%)	4.00	3.67	2611
Fully human-written	0 (0%)	N/A	N/A	N/A
Total	4 (100%)	3.50	3.75	2846

Title	Ratings	Review Text	EditLens Prediction
Learning Non-Gradient Diffusion Systems via Moment-Evolution and Energetic Variational Approaches	Soundness: 3: good Presentation: 3: good Contribution: 3: good Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.	In the paper, the authors propose a data-driven method to learn the drift vector field of stochastic dynamic systems. Specifically, a two-stage method based on a physically consistent penalty and first-moment evolution is proposed to solve this problem. - The investigated problem is important. - The mathematical formula is clear written well. - The idea of considering rotation filed and potential filed is reasonable and insteresting. - The code is not provided, which limits the reproducibility of the work. - The experimental section is a significant weakness of the paper. The most critical issue is the lack of comparisons with state-of-the-art (SOTA) baselines from top-tier conferences such as ICLR and NeurIPS. The current comparisons are limited to relatively simple methods, many of which are simplified variants proposed by the authors themselves. - Another weakness lies in the experimental metrics, which are not intuitive, while the analysis tends to be overly subjective. For instance, in line 384, it is stated that “our method still yields reasonably reliable results.” However, it remains unclear what RMSE value qualifies as “reasonable,” as this is highly dependent on the specific scale and context of the system under study. Such claims may lead to confusion. - The results in Figure 2 are also difficult to interpret. It is unclear from the figure whether the proposed method performs well or poorly. At the very least, a side-by-side comparison with the fields learned by SOTA methods should be provided to better illustrate the effectiveness of the proposed approach. - The experimental comparisons are currently limited to simplified baselines. Could you include comparisons with state-of-the-art methods to better demonstrate the relative performance and competitive advantage of your proposed approach? - Please clarify the plans for releasing the source code and the experimental setup.	Lightly AI-edited
Learning Non-Gradient Diffusion Systems via Moment-Evolution and Energetic Variational Approaches	Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper proposes a two-stage weak-form learning framework for recovering drift decompositions in generalized diffusions without detailed balance. Stage 1 identifies the drift from first-moment evolution; Stage 2 recovers the pseudo-potential using an energy-dissipation law with a physics-motivated orthogonality penalty. The idea of combining weak-form moment evolution with energy-based learning is interesting and potentially impactful for non-gradient stochastic dynamics. However, several parts of the theoretical formulation, numerical justification, and experimental design remain insufficiently rigorous or clearly motivated. 1. Addresses the important problem of learning non-gradient stochastic dynamics, beyond detailed-balance systems. 2. The weak-form formulation is appealing for noisy data, avoiding higher-order derivative estimation. 3. The paper provides multiple 2D diffusion examples, including noisy and rough potentials, plus ablation studies on penalty and training strategies. 1. Derivation of Equations (6)–(9) lacks rigor and clarity. 1.1 The transition from Eq. (4) to Eq. (6) appears ad hoc and not rigorously derived from the underlying stochastic dynamics or variational principles. 1.2 It is unclear how Eq. (8) is obtained from Eq. (6)—specifically, how the second term in Eq. (6) is eliminated and under what assumptions this simplification holds. 1.3 The statement that “we can minimize (8) in a weak form to learn the pseudo-potential and the rotation part” lacks justification. The rationale for why minimizing this functional corresponds to learning the desired decomposition should be explicitly established. 2. The constraint $\nabla\psi \cdot R = 0$ is enforced only via an integrated (global) penalty term. There is no theoretical argument showing that minimizing this global loss guarantees pointwise orthogonality. A discussion of this discrepancy and its practical implications would be important. 3. The paper provides no analysis of the consistency, bias, or variance of the proposed estimators. Without such analysis, it is unclear under what conditions the learned drift and potential converge to the true physical quantities. The good numerical results currently shown may depend strongly on the specific form of the training data rather than the generality of the method. In particular, the dataset includes distributions at long times, which might already be close to the stationary distribution. This could artificially improve the training performance. It is recommended to quantify this effect—for example, by computing and plotting the distance between the data distribution at large t and the stationary distribution—to clarify how much of the observed accuracy stems from near-stationary data. 4. All numerical examples are synthetic 2-D toy problems. The absence of higher-dimensional or real-world cases limits the demonstration of scalability and practical relevance. Moreover, no quantitative evaluation of runtime, efficiency, or robustness across architectures is provided. 5. The assumption $b = -\tfrac{1}{2}\sigma^2\nabla\psi + \tfrac{1}{2}\sigma^2R$ is central to the method but not theoretically or physically discussed. All test cases are artificially constructed to satisfy this decomposition, which weakens the claim of general applicability. The paper should clarify under what conditions this assumption holds and how violations would affect learning performance. 6. The assumption that $\sigma$ is a scalar function is not discussed either. Please see weakness above	Fully AI-generated
Learning Non-Gradient Diffusion Systems via Moment-Evolution and Energetic Variational Approaches	Soundness: 4: excellent Presentation: 4: excellent Contribution: 2: fair Rating: 6: marginally above the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.	The authors propose a two-stage method for learning the drift of SDEs in the setting where the ground truth SDE does not satisfy a detailed balance condition. The approach is based on decomposing the drift into a gradient (pseudo-potential) and a rotational term. The framework uses snapshots of the probability density function generated from different initial Gaussian densities, captured at both short and intermediate times. Stage 1 learns the total drift via first-moment evolution, and Stage 2 learns the decomposition using an energy dissipation law. - The authors identify a tractable subset of the difficult non-gradient SDE learning problem: systems where the drift decomposition satisfies a pointwise orthogonality constraint. - The paper proposes a novel two-stage learning framework that cleverly combines moment-evolution and energy-dissipation principles. - The numerical evaluation, while limited to 2D examples, is thorough. It effectively demonstrates the method's robustness to significant data noise, rough potentials, and non-canonical rotations. - The work is very well presented. - The method's primary weakness is its data requirement. It assumes access to full, gridded snapshots of the density function, which is unrealistic in most practical applications where data typically consists of sparse, noisy particle trajectories. - The reliance on gridded data and Riemann sums for integration raises concerns about the method's scalability to high-dimensional problems due to the curse of dimensionality. - The authors claim applicability to biology and engineering, but the experiments are limited to 2D toy problems. - The entire method is contingent on the pointwise orthogonality constraint. There is limited discussion on the prevalence of this assumption in real-world systems or how the method's performance degrades if this constraint is only approximately satisfied. - The noise robustness experiment is a good inclusion. However, could the authors provide a more formal analysis of error propagation? Specifically, if the density f were estimated from sparse data (e.g., via KDE), how would that estimation error propagate through the loss functions? - How does this method perform in practical applied settings?	Lightly AI-edited
Learning Non-Gradient Diffusion Systems via Moment-Evolution and Energetic Variational Approaches	Soundness: 2: fair Presentation: 3: good Contribution: 3: good Rating: 2: reject Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully.	This paper proposes a two-stage learning algorithm for generalized diffusion processes with non-gradient drift fields: first stage learning the drift fields by the first moment estimates; and second stage learning the learn the pseudo-potential parts of the drift fields by applying a physically consistent penalty in the loss to enforce orthogonality of the pseudo-potential and rotational components. The method is built on prior work such as Lu et al. (2024) and introduces a penalty term by considering the pointwise orthogonality of the pseudo-potential and rotational components to improve robustness to noisy density data. Numerical experiments in low dimensions illustrate that the method better recovers the rotational components than baseline approaches. The paper presents an interesting and physically grounded approach to learning non-gradient diffusion drift decomposition via Helmholtz methods. However, the empirical and theoretical scope remains limited to low dimensional, synthetic settings. With stronger sensitivity analyses, realistic applications, and tighter parameter guidance, the work could become significantly more compelling. (1) The application of Helmholtz decomposition to drift learning is conceptually compelling: separating the gradient (pseudo-potential) and divergence-free (rotational) parts aligns with physical modelling of non-equilibrium systems. (2) The physically consistent penalty enforcing pointwise orthogonality is well motivated and matches the dimension of energy dissipation rate and may improve robustness to noisy data. (3) The authors provide clear implementation details and present a set of representative synthetic examples, which show improved drift reconstruction over simpler baselines. (1) The two-stage algorithm requires accurate density function data on a relatively large domain with dense spatial grids; this limits applicability to low-dimensional problems. The manuscript primarily uses numerical solutions of the Fokker–Planck (FP) equation as “given” density data, which raises the question: if the FP drift/diffusion terms are known (so the equation can be solved), then the learning task is less realistic. (2 There is minimal discussion or theory guiding the choice of time windows $t_1,t_2$, $T_1,T_2$ and penalty strength $\lambda$ with respect to the underlying diffusion process (e.g., drift/diffusion regularity, relaxation time scales, spectrum of the generator). (3) As acknowledged by the authors, the method cannot currently learn time-dependent pseudo-potentials, rotational components that vary in time, or diffusion processes with nonlocal effects. These restrictions should be discussed more clearly in terms of limitations and future work. (4) The method depends on an accurate density field; the manuscript lacks any numerical study of how errors in the density data propagate into drift estimation and result in bias. (5) The numerical examples remain synthetic and low-dimensional. I suggest applying the method to a more practically motivated diffusion (e.g., impurity diffusion in crystalline solids) to demonstrate relevance beyond toy problems. (1) I wonder how errors in the density data propagate into drift estimation and affect the choice of hyperparamter and result in bias in the learning objectives. (2) Following the previous question, I wonder if some types of errors or noises in the density data are less important to the learning of rotation components . (3) I suggest applying the method to a more practically motivated diffusion (e.g., impurity diffusion in crystalline solids) or higher dimensional examples to demonstrate relevance beyond toy problems.	Lightly AI-edited

PreviousPage 1 of 1 (4 total rows)Next