ICLR 2026 - Reviews

Submissions Reviews

Reviews

EditLens Prediction: Fully AI-generated Heavily AI-edited Moderately AI-edited Lightly AI-edited Fully human-written All

Rating: 1 2 3 4 5 6 7 8 9 10 All

Confidence: 1 2 3 4 5 All

Summary Statistics

EditLens Prediction	Count	Avg Rating	Avg Confidence	Avg Length (chars)
Fully AI-generated	0 (0%)	N/A	N/A	N/A
Heavily AI-edited	0 (0%)	N/A	N/A	N/A
Moderately AI-edited	0 (0%)	N/A	N/A	N/A
Lightly AI-edited	0 (0%)	N/A	N/A	N/A
Fully human-written	3 (100%)	4.67	3.67	4464
Total	3 (100%)	4.67	3.67	4464

Title	Ratings	Review Text	EditLens Prediction
How Learning Dynamics Drive Adversarially Robust Generalization?	Soundness: 4: excellent Presentation: 4: excellent Contribution: 4: excellent Rating: 8: accept, good paper Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.	This paper studies PAC Bayes guarantees for adversarial loss. The authors start with a general upper bound in terms of quantities related to the posterior, then make it more specific to Gaussian posteriors. Subsequently, they study what that posterior looks like in two regimes of SGD with momentum on the adversarial loss (I think -- see below). They verify the functionality of their bounds experimentally. The paper is written extremely clearly, and the development is very logical. I genuinely enjoyed reading the paper. The results are nice and insightful. I am not close enough to the literature to evaluate how different they are from existing results, but they are interesting and the approach is well-motivated. I see how the proposed Gaussian model is in fact less restrictive than models in previous work, but I am curious if the authors can comment on its limitations. I also have a few additional questions that are in the section below. 1. What is the quantifier over epsilon in Defn 3.1? shouldn't this be the eps-adv risk or something? 2. What role does the geometry induced by the metric in which the perturbation is bounded play in the generalization bounds? 3. Is the analysis for SGD on the standard (non-adv) loss, followed by evaluation of that predictor on the adv loss? or SGD run on the adv loss? I think the latter but want to confirm.	Fully human-written
How Learning Dynamics Drive Adversarially Robust Generalization?	Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper investigates robust generalization within the PAC-Bayesian framework. By assuming a Gaussian posterior over the model parameters and a quadratic approximation of the loss, the authors derive an upper bound on the robust generalization gap. This bound connects the generalization behavior to the Hessian of the empirical adversarial loss at a local optimum, as well as to the mean and covariance of the posterior. The paper further analyzes how the parameters of this Gaussian posterior evolve during SGD with Polyak momentum, thereby linking training dynamics to robust generalization. The paper approaches robust generalization from an appealing dynamic perspective—focusing on how the optimization trajectory and posterior evolution influence generalization—rather than adopting a static hypothesis-space view based on Rademacher complexity or other capacity measures. The connection between posterior evolution and curvature-based generalization offers an interesting conceptual direction, potentially bridging PAC-Bayesian analysis and training dynamics. 1. Unclear contribution and novelty. The paper does not clearly describe its theoretical novelty compare to prior PAC-Bayesian analyses of adversarial robustness (e.g., [Viallard et al., 2021](https://proceedings.neurips.cc/paper/2021/file/78e8dffe65a2898eef68a33b8db35b78-Paper.pdf); [Mustafa et al., 2019](https://ml.cs.rptu.de/publications/2023/computing_non_vacuous_pac_bayes.pdf) ; Xiao et al., 2023). It is unclear how the presented bounds improve the previous results. 2. Concerns regarding the Quadratic Loss assumption. Since Assumption 3.5 enforces $\hat{R}_{\rm adv}(w, S)$ to be a quadratic function for any choice of $S$. By taking $S$ = {$ (x,y) $}, this assumption implies that the adversarial loss itself is a quadratic function w.r.t $w$ for any $(x,y)$: $\hat{R} _ { \rm adv}(w, S) = \ell _ {\rm adv} (w; x,y ) = \ell _ {\rm adv}(w; x,y) + \frac{1}{2}(w-w)^{T} H ( w- w* )$ This severely departs from practical settings where $\ell_{\mathrm{adv}}$ involves deep neural networks and non-quadratic losses such as cross-entropy. 3. Limited explanatory power for robust generalization. The paper claims to shed light on robust overfitting (Rice et al., 2020), yet the derived bounds do not appear to explain when robust overfitting occurs or disappears. For instance, adversarial training achieves good robust generalization on MNIST (Madry et al., 2019) and for small perturbation radii or large datasets, whereas robust overfitting is prominent only under certain regimes. The presented bounds (Theorems 3.7, 4.5, and 4.7) do not capture how data distribution, perturbation radius, or sample size affect the generalization behavior. Moreover, when the perturbation radius approaches zero, the framework fails to recover standard generalization phenomena (e.g., CIFAR-10 models generalizing well under clean training but not under adversarial training). 4. Writing and presentation issues. Several statements are vague or lack sufficient justification. Ad hoc terms are introduced (e.g., “propagation term,” “injected term”) without formal definition or motivation. Key assumptions (e.g., stationarity, steady state) are not clearly stated or connected to the analysis. See “Questions” below for specifics. 5. Missing key references. The paper omits several relevant studies on robust generalization and algorithmic stability, including: (1) Yue Xing et al. On the algorithmic stability of adversarial training. (2) Jiancong Xiao et al. Stability analysis and generalization bounds of adversarial training. (3) Runzhi Tian et al. Algorithmic Stability Based Generalization Bounds for Adversarial Training. (4) Daniel Cullina et al. Pac-learning in the presence of evasion adversaries. (5) Shaopeng Fu et al. Theoretical analysis of robust overfitting for wide dnns: An ntk approach (6) Viallard et al., A PAC-Bayes analysis of adversarial robustness (7) Mustafa et al., Non-vacuous PAC-Bayes bounds for models under adversarial corruptions. Proper discussion of these works would better situate the contribution and clarify the incremental advance. Minor Issue: Lemma 3.4 restates the closed-form expression for the KL divergence between Gaussians, which is standard and could be omitted or moved to an appendix. >However, these bounds often abstract away from the actual optimization trajectory and adopt simple isotropic Gaussian posteriors for tractability, overlooking structural properties of the learned model that are crucial for explaining generalization. Could the authors elaborate on how existing PAC-Bayesian bounds "_abstract away_" from the actual optimization trajectory? In addition, please clarify what specific limitations prior works face in explaining robust generalization. Is the key issue primarily the use of isotropic Gaussian posteriors, or are there other underlying factors? If the isotropic assumption is central, please explain why it limits explanatory power. >PAC-Bayes bounds offer general guarantees but lack fidelity to the learning dynamics, whereas curvature-based approaches provide qualitative insight without rigorous predictive guarantees. What specific _guarantees_ are being referred to ? What does “lack fidelity” mean in this context? Please make these terms precise and support the claim with references. >we use d, m to denote the dimensions of the input space ${\cal S}$. Typo: the input space should be denoted by ${\cal X}$ not ${\cal S}$. In Assumption 3.5, the statement "for any $w \sim {\cal Q}$" —since ${\cal Q}$ is Gaussian — appears equivalent to "for any $w \in {\mathbb R}^m$"? If so, the assumption is independent of ${\cal Q}$ ; please clarify. The remark for Lemma 3.6 (Line 188-190) merely restates the formula. Could the authors elaborate on the insight or interpretation of this result? Regarding the local optimum $w$ in Assumption 3.5, since $w$ depends on ${\cal S}$, it should arguably be treated as a random variable, rather than a constant as stated at Line 202? At Line 220, please cite relevant references for SGD with Polyak momentum and clarify whether the theoretical analysis extends to standard SGD. In Lemma 4.2, >suppose the posterior Q reaches a steady state with stationary mean Could you justify this assumption and define what “steady” or “stationary state” precisely means? Does ${\cal Q}$ denote the marginal distribution of $w _ T$ after T SGD steps, or the conditional distribution of $w_t$ given $w_{t-1}$? Clarifying this would help interpret Theorem 4.5. For remark 4.8, please define and justify the introduced terms “propagation term” and “injected term.”	Fully human-written
How Learning Dynamics Drive Adversarially Robust Generalization?	Soundness: 3: good Presentation: 3: good Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper studies the robust generalization issues and use PAC-Bayesian framework to derive the generalization bound in adversarial training. Specifically, the author use second-order approximation and consider SGD with momentum to investigate how learning rate, Hessian structure and gradient noise affect the mean and covariance of posterior distribution, which is consistent with and validated by numerical experiments. ++ The paper is generally well-written and the overall framework is not difficult to follow. ++ Under the second-order approximation, it is novel to derive the closed-form posterior covariance under both constant learning rate regime and learning rate decay regime. ++ The analysis is relatively comprehensive: it considers many factors that may affect generalization error, including the momentum mechanism, the gradient noise, the Hessian structure and the learning rate. The theoretical analyses is qualitatively consistent with the numerical experiments. 1. The major concerns are restrictive assumptions: * Assumption 3.3: I do not think the posterior distribution after adversarial training for general deep neural networks is a Gaussian distribution. Probably, the authors can assume that the posterior distribution is a mixture of several super Gaussian distributions, as the probabilistic density will generally concentrate during training, and different initialisations will lead to converged parameters near different local minima. * Assumption 3.5 is only applicable when $w$ is close to $w$. This is a bit in contradiction with Assumption 3.3. When we using Gaussian distribution as the random initialization, the parameter distribution in early steps is close to Gaussian distribution, but $w$ is far away from $w$. On the other hand, in the late phase of training when $w$ is close to $w$, the distribution of the parameters is not Gaussian. Probably, the authors can have some additional assumptions, such as the ones in lazy training, which means the parameters do not move a lot during training. However, this may introduce additional conditions. 2. There is a considerable gap between the theory and practice. The theoretical analyses do not utilise some unique properties of adversarial training. For example, in practice, we see a larger robust generalization gap when the magnitude of adversarial perturbation is larger (i.e. larger $\epsilon = \|\delta\|_p$), I do not see how this variable affects the generalization gap bound. In an extreme case, when the adversarial perturbation's magnitude is zero, will the bound in Theorem 4.7 degrade of analyses of normal training? What makes the results special for adversarial training? I think this part needs further elaboration. 3. The experiments are not comprehensive, the authors compare the performance of adversarial training and AWP, which include the factors of Hessian structure (AWP prefers a flatter minma) and learning rate (both have learning rate decay). However, the effect of gradient noise (which is mentioned in the abstract and the introduction) is not adequately discussed and studied. In addition, it would be better to compare the emprical generalization gap and the theoretical one by Theorem 4.5. The results in Table 1 and Table 2 are not convincing enough. Minor issues: 1. Based on analyses in the appendix, $\rho_i$ in Equation (14) actually depends on $\eta_2$, the authors should clearly indicate this in the maintext to avoid confusion, because the right hand side should be independent of $k$ when $\eta_1 = \eta_2$. 2. Some missing related literature: The convergence of adversarial training: "On the Convergence and Robustness of Adversarial Training" (2019) "On the loss landscape of adversarial training: Identifying challenges and how to overcome them (2020)." * More literature about robust overfitting: The authors should compare with the bounds in these works technically "Non-vacuous Generalization Bounds for Adversarial Risk in Stochastic Neural Networks" (2024) "On the impact of hard adversarial instances on overfitting in adversarial training" (2024). In general, I think the research in this work can contribute to the machine learning community, but the manuscript is not ready for publication given the concerns above. I welcome the authors to address my concerns during rebuttal and will reconsider my ratings after rebuttal. The questions are pointed out in the weakness part. Please answer them one by one. 1. [Weakness 1] How assumptions are satisfied in practice? Is it possible to provide a weaker and more generic assumption in contrast to Assumption 3.3 and Assumption 3.5. I believe these assumptions work well for a convex problem like linear regression, but I do not see how it is satisfied in deep neural networks. 2. [Weakness 2] What makes the results special for adversarial training? How adversarial perturbations (especially their magnitude) affect the generalisation bound in your theorem? 3. [Weakness 3] More experiments to validate the effect of gradient noise are needed (like using different batch size). In addition, it would be better to compare the emprical generalization gap and the theoretical one by Theorem 4.5. 4. Please pay attention to some minor issues and missing literature pointed out above.	Fully human-written

PreviousPage 1 of 1 (3 total rows)Next