|
Physics-informed Residual Flows |
Soundness: 3: good
Presentation: 2: fair
Contribution: 3: good
Rating: 6: marginally above the acceptance threshold
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
Physics-informed residual flows proposes a study on physics informed neural networks optimization issues. 2 issues are identified : gradient shattering and flow mismatch. After an empirical and theoretical study of these problems, the paper proposes the solve them by considering residual models. The final part consists in a evaluation of the proposed method and a comparison with existing baselines.
- The subject is a well identified issue of pinns
- The theoretical analysis support the claims
- A descriptive study illustrate the claims
- Numerous evaluation illustrate the proposed method
- The paper is rich, and introduces several concept. The 2 key insights of the paper sometimes interfere which makes the reading a bit hard
- Some notation are not introduced (eg h, f line 323)
### Questions
- Line 30-45 : Isn't this problem link to conditioning of the PINNs loss ? This has been studied recently in numerous works (eg [1-3])
- Could you detail the architecture of table 1 ? Additionnaly, at first reading, it was not clear that ‘-‘ was referring to a minus (which is the case in my understanding?)
- I think a link/explanation/comparison with existing residual networks in PINNs would help the reader better understand the contribution of your work. Why related residual based pinns models do not observed the same improvement as yours ?
- Lines 228 : This formulation seems to be linked to the 2nd order optimization proposed by [1], proposition 1? Could you elaborate on the link between the 2 methods?
- In practice, what does mean "small" alpha_k? (l 249)
- Have you considered to add P-PINNs and PRF in table 1 for comparison ? Do you have any insight on what explains that in some PDE, O-PINNS, performs best and in others ResPINNs performs best ?
- Table 3 shows that hyper parameters selection is important ? Do you any insight about how to carefully select them ?
- line 445: Have you compared to learning all block at once ? The idea behind this is to ablate how much improvement the progressive learning of blocks helps, compared to the exact same network, learned in a standard way.
- Can you comment the results on the PINNACLE benchmark ? (table 5 in appendix) what explains the OOM ? What is the setting ?
### Minor comment
- Line 046 need not align -> is not align ?
- Liine 178 What doest Theta refer to ?
I am not an expert in the theory behind the proofs so I couldn't check them.
### References
[1] Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective, Sifan Wang, Ananyae Kumar Bhartari, Bowen Li, Paris Perdikaris, 2025
[2] An operator preconditioning perspective on training in physics-informed machine learning, Tim De Ryck, Florent Bonnet, Siddhartha Mishra, and Emmanuel de Bézenac, 2023
[3] Learning a Neural Solver for Parametric PDE to Enhance Physics-Informed Methods, Lise Le Boudec, Emmanuel de Bezenac, Louis Serrano, Ramon Daniel Regueiro-Espino, Yuan Yin, Patrick Gallinari, 2024. |
Fully human-written |
|
Physics-informed Residual Flows |
Soundness: 3: good
Presentation: 3: good
Contribution: 3: good
Rating: 4: marginally below the acceptance threshold
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
The paper argues that many failures of PINNs come from two structural issues. The first is gradient shattering that is unstable/uninformative derivatives with depth. The second is flow mismatch updates that reduce residuals locally but drift from the true PDE trajector). It proposes reframing PINNs as residual flows (“ResPINNs”) that make small, iterative corrections, which keep updates aligned with descent directions and maintain near-identity Jacobians for stable gradient propagation. Theory and controlled ablations suggest the stability gains recently attributed to attention or state-space sequence modules are actually driven by residual pathways themselves; when attention/SSM blocks are replaced by simple local mappings with matched parameters, performance remains comparable, pointing to residual refinement as the key mechanism.
1. Viewing PINN's failure modes from the perspective of flow matching is novel, and the proposed method is effective.
2. The paper offers a crisp diagnosis of two structural failure modes—gradient shattering and flow mismatch—and ties them to concrete training pathologies in PINNs, not just optimization folklore.
3. The paper provides stabilizing principles with theory: keep Jacobians near identity and updates aligned with descent, which directly target those failure modes rather than adding heuristics.
1. The paper claims of “higher accuracy with fewer parameters” are strong, but wall-clock time, GPU memory, and gradient/Jacobian compute overheads aren’t reported—especially important given OOM remarks for baselines. Include per-task training time, peak memory, and AD call counts to show efficiency, not just parameter counts.
2. The paper lacks clarity in describing the proposed method, making it difficult for readers without prior knowledge of residual flow to comprehend. The authors should provide clearer explanations of the model architecture, optimization objectives, and other key aspects—ideally supplemented with schematic diagrams (I see Fig.5 in Appendix, but I think it should be in main text, also more description as a paragraph needed). Additionally, they should offer a comparative description of the implementation details for the three proposed approaches.
1. What's the difference between ResPINN and directly using a residual-connected MLP? Why not directly use a residual-connected MLP?
2. A vanilla PINN (MLP + tanh) trained with FP64 precision [1] doesn't show a failure mode, and some of the results can even surpass the ResPINN. If the flow-based understanding holds, why does this happen? Is that mean there is not flow mismatch when trained with FP64?
3. Can the author report the proposed method's sensitivity to computation arithmetic precision?
4. I found the author's supplementary is an easy-to-use PINN playground with potential for contribution and impact. While most of the code files are not visible, can the authors fix it? And will the playground be open-sourced?
I am open to increase the rating if all the concerns addressed.
Reference:
[1] Xu C, Liu D, Nassereldine A, et al. FP64 is All You Need: Rethinking Failure Modes in Physics-Informed Neural Networks. NeurIPS 2025. |
Lightly AI-edited |
|
Physics-informed Residual Flows |
Soundness: 2: fair
Presentation: 2: fair
Contribution: 2: fair
Rating: 2: reject
Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. |
This paper revisits the failure modes of Physics-Informed Neural Networks (PINNs), attributing them to two structural issues: gradient shattering and flow mismatch. To address these challenges, the authors propose ResPINNs, a residual-flow formulation that interprets PINNs as iterative refinement schemes similar to classical predictor–corrector solvers. Theoretical analysis suggests that residual flows preserve gradient alignment and maintain near-identity Jacobians, which helps stabilize optimization. Experiments on several benchmark problems, including the Reaction, Convection, Wave, and Heat equations and the PINNacle suite, indicate improved accuracy compared to standard PINNs and recent variants such as PINNsFormer and PINNMamba.
The paper presents a conceptually appealing idea that links residual networks, neural ODEs, and PINNs within a common residual-flow perspective. The attempt to provide a theoretical explanation of gradient alignment and Jacobian stability is interesting and may offer insight into training dynamics of PINNs. The empirical section is clearly written, and the topic of improving PINN robustness remains relevant to the ICLR audience.
1. The theoretical analysis is not fully convincing. The paper identifies gradient shattering and flow mismatch as the main failure modes of PINNs and uses several theorems to justify the need for residual connections. However, prior studies have shown that other issues such as gradient conflicts, spectral bias, etc can also be major sources of error. The authors should provide quantitative evidence or controlled comparisons to demonstrate that the proposed failure modes are indeed the dominant ones.
2. The experimental design in Section 5.2 is questionable. The gradual freezing strategy may seem to support iterative refinement since all stages share the same decoder. However, freezing earlier blocks fundamentally changes the optimization process compared with a standard residual network trained end to end. To support the claim, the authors should visualize intermediate blocks of ResPINN to directly show refinement behavior.
3. The empirical evaluation is relatively weak. Several strong baselines are missing. The authors should compare with more competitive works such as Urbán et al. [1] and Wang et al. [2] to provide a fair and convincing performance assessment.
4. The benchmarks used in the paper appear much simpler than those in recent studies [1,2], which have demonstrated substantially higher accuracy on more challenging PDEs. If existing methods already achieve better accuracy, then it may suggest that the proposed failure modes may not be the primary limitation of PINNs. The authors should consider harder tasks and report better accuracy to better support their claim.
5. It seems that Piratenet has already explored the challenges of scaling PINNs to deeper architectures, but this connection is not clearly acknowledged or discussed in the paper. Without clarifying how the proposed approach differs from or extends PirateNet, the claimed novelty appears overstated.
[1] Urbán, J.F., Stefanou, P. and Pons, J.A., 2025. Unveiling the optimization process of physics informed neural networks: How accurate and competitive can PINNs be?. Journal of Computational Physics, 523, p.113656.
[2] Wang, S., Li, B., Chen, Y. and Perdikaris, P., 2024. Piratenets: Physics-informed deep learning with residual adaptive networks. Journal of Machine Learning Research, 25(402), pp.1-51.
1. On line 323, what is the value of the parameter α? Is it a fixed constant or a learnable variable? If it is set to one, does the formulation reduce to a standard ResNet, and if it is learnable, how is it initialized and how does it differ from the skip connections in PirateNet?
2. PirateNet appears to share many architectural and conceptual similarities with the proposed model yet achieves higher accuracy in reported results. Why did the authors not include a direct comparison with PirateNet under the same settings?
3. Can the proof of Theorem 3.1 be made more self-contained? Would it be possible to include a simple numerical experiment to empirically verify the theorem’s predictions?
4. In Figure 4, how exactly does the presented Jacobian distribution imply vanishing gradients? Could the authors clarify this interpretation?
5. In Section 3.3, the statement “we interpret training as a latent-space flow problem indexed by an auxiliary solver time k” is confusing. What exactly is the meaning of this auxiliary time variable k? This formulation seems to conflate the ODE describing training dynamics of the network with the ODE governing forward propagation in ResNets. Could the authors clarify this? |
Lightly AI-edited |
|
Physics-informed Residual Flows |
Soundness: 4: excellent
Presentation: 3: good
Contribution: 3: good
Rating: 8: accept, good paper
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
The paper diagnoses two structural failure modes in PINNs: gradient shattering and flow mismatch, and proposes ResPINNs, a residual-flow reformulation (with discrete ResPINN and continuous O-PINN instantiations) that enforces small corrective updates and near-identity Jacobians to stabilize training and improve solution fidelity. The authors present a mix of theoretical arguments (mean-field style statements about Jacobian decorrelation, a lemma on local descent, and bounds on Jacobian conditioning), ablations that isolate residual pathways from sequence modules, and empirical evidence across canonical PDEs and the large PINNacle benchmark showing substantial error reductions and more stable diagnostics (Jacobian spectra, gradient-alignment, update magnitudes).
1. The authors provide a clear definition of the two failure modes and explain why these are particularly harmful for PINNs that use sparse collocation points.
2. The paper offers an explicit theoretical account that links Jacobian decorrelation and update alignment to optimization behavior in residual flows.
3. The residual-flow formulation is connected to well-understood numerical ideas (Euler updates, RK solver) and neural ODEs, which helps position ResPINNs as a principled, solver-inspired architecture.
4. The ablation experiments are designed to isolate residual pathways from other architectural changes by matching parameter counts and replacing sequence modules with simpler mappings.
5. Empirical results across canonical PDEs and the PINNacle suite show large and consistent error reductions for ResPINN relative to multiple baselines, supporting the method’s breadth.
6. The paper includes mechanistic diagnostics (Jacobian spectra, relative update sizes, gradient alignment) that directly support the proposed mechanism rather than relying on performance numbers alone.
7. Reproducibility materials are noted (appendices for proofs, PDE setups, architecture details, and an anonymous code repository), which increases confidence that the experiments can be validated.
1. The mean-field “gradient shattering” argument depends on large-width asymptotics, but the paper does not quantify how these asymptotic results translate to the finite-width networks used in experiments.
2. The local descent results (Lemma 3.2 / Theorem 3.3) assume small per-step updates and depth-aware smoothness constants, yet the manuscript provides limited prescriptive guidance on how to choose residual scaling or other hyperparameters to ensure these assumptions hold in practice.
3. Wall-clock runtime and peak memory costs relative to the baselines are not reported in sufficient detail, yet the paper claims some baselines hit out of memory and ResPINN “trains successfully”.
4. The evaluation excludes the four hardest PINNacle subtasks, which may bias the apparent generality of ResPINN.
5. The paper frames O-PINN (continuous) vs ResPINN (discrete) as complementary but does not provide clear operational guidance for choosing between them in realistic settings (e.g., when to prefer RK4 integration vs stacked residuals).
1. Which concrete hyperparameter(s) (residual scaling α, block width/depth, or normalization) most directly control the “small-step” regime used in the proofs, and can the authors provide recommended ranges observed empirically?
2. For tasks where baselines “fail to converge” or run out of memory on PINNacle, which specific tasks and baseline configurations fail, and were targeted recovery attempts (e.g., reduced batch sizes, lower widths) tried?
3. How sensitive are the Jacobian and alignment diagnostics to the evaluation mesh density and collocation sampling scheme (fixed grid vs adaptive sampling)?
4. Have the authors tested ResPINN on inverse or noisy-data PDE problems where data-fitting and PDE residual objectives compete, and if so, how does residual alignment affect identifiability and overfitting?
5. Can the authors provide a concrete example (hyperparameters, step sizes, and observed ∥Tk∥ ranges) showing that the practical training runs stayed within the small-step regime used in the theoretical lemmas?
6. Do the authors expect the same residual-flow benefits to hold for alternative PDE backbones (e.g., Fourier-based FNOs or DeepONet variants), or are the claims specific to fully connected PINN architectures?
7. The paper reports that O-PINN sometimes performs better with wavelet activations; can the authors elaborate on why certain activations interact favorably with continuous integration schemes? |
Fully AI-generated |