ICLR 2026 - Reviews

Submissions Reviews

Reviews

EditLens Prediction: Fully AI-generated Heavily AI-edited Moderately AI-edited Lightly AI-edited Fully human-written All

Rating: 1 2 3 4 5 6 7 8 9 10 All

Confidence: 1 2 3 4 5 All

Summary Statistics

EditLens Prediction	Count	Avg Rating	Avg Confidence	Avg Length (chars)
Fully AI-generated	1 (25%)	2.00	3.00	4222
Heavily AI-edited	0 (0%)	N/A	N/A	N/A
Moderately AI-edited	2 (50%)	6.00	4.00	3500
Lightly AI-edited	0 (0%)	N/A	N/A	N/A
Fully human-written	1 (25%)	2.00	4.00	3363
Total	4 (100%)	4.00	3.75	3646

Title	Ratings	Review Text	EditLens Prediction
RDNAS: Robust Dual-Branch Neural Architecture Search	Soundness: 2: fair Presentation: 3: good Contribution: 2: fair Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper proposed Robust Dual-Branch NAS (RDNAS), a DARTS-based framework that utilises two parallel branches to jointly optimise clean and adversarial accuracy. It employs adversarial training directly and uses a Robust Outlier-Aware Shapley Estimator (ROSE) for operation scoring and pruning guidance during search and for discretization. Experiments on image classification tasks and various adversarial attacks demonstrate improvements over state-of-the-art methods. - It’s an interesting idea to include an additional cell type (branch) in the supernet to allow for better adversarial robustnes already during the search - The cross-dataset transfer in Table 3 demonstrates promising results. - Using a combination of interquartile range and median-of-means as statistical techniques in ROSE for guiding the search is a novel idea, which could improve the knwon DARTS poor ranking consistency. - Code is provided. - DARTS often uses smaller networks for search (fewer cells) and inferences are made on larger networks (more stacked cells). This is common, but it relies on the assumption that subnet ranking is consistent during both search and inference. However, DARTS is known to not hold this assumption. In addition training the found network from scratch is another bottleneck. These issues are generally not mentioned, although using ROSE could provide some benefits. A deeper discussion about known issues of DARTS and how this paper tackles them is needed to show why DARTS should be used at all. Further using adversarial training during search and for the found network seems like a significant computational overhead. - There is no information about the actual found networks. - Due to the search space being based on DARTS, there’s no possibility of going beyond CNN-types. I’m not sure this should be the current NAS direction anymore. - Many one-shot works exist that overcome the scaling issue provided by DARTS, so why use that here? - Due to the adversartial training already during search, standard training is not possible. How can we evaluate the native robustness of the found network without needing adversarial training? - There is no reproducibility statement. - A fair comparison in Table 1 would also use a similar amount of flops. I’m not sure if the additional amount of flops is sufficient to justify the rather marginal improvement. - Given the vast amount of NAS literature using zero-cost proxies to search for robust networks quickly, why would DARTS be necessary or can this be combined to also include the adversarial training weights?  Missing literature: Xiangxiang Chu, Bo Zhang, and Ruijun Xu. FairNAS: Rethinking evaluation fairness of weight sharing neural architecture search. In ICCV, 2021 J Jeon, Y Oh, J Lee, D Baek, D Kim, C Eom, B Ham. Subnet-Aware Dynamic Supernet Training for Neural Architecture Search. In CVPR 2025 - More detailed information about the transfer experiments, as shown in Table 2 should be provided to better understanding what is meant in this experiment. - How many runs were conducted ? - How do the resullting cells look like? Are there topological differences over different runs or also during search? - Which operations were mostly chosen and discarded by ROSE? - Is the supernet after trainig biased to a certain type of network (e.g., low-complexity, wide, deep) ?	Fully human-written
RDNAS: Robust Dual-Branch Neural Architecture Search	Soundness: 2: fair Presentation: 3: good Contribution: 3: good Rating: 4: marginally below the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	The paper proposes RDNAS (Robust Dual-Branch Neural Architecture Search), a framework that jointly optimizes clean and adversarial robustness by introducing a dual-branch cell. To stabilize architecture evaluation under adversarial training, it introduces ROSE (Robust Outlier-Aware Shapley Estimator), which combines median-of-means and interquartile-range filtering to improve decisions. - I found the paper well-written and easy to read. - The provided contributions have the potential to be relevant. In general, the paper should benefit from a stronger motivation and less vague claims. Here are the main weaknesses I found: Relevance of ROSE block: It is not clear to me why the ROSE mechanism is actually needed. The authors say that avoids gradient noise, but still this should be described more in detail, as it feels somewhat very generic. From adversarial robustness literature, it is commonly known that this issue mostly arises when there is an inherent randomness in the inference process, yet I fail in understanding where is the Shapley-based estimation introducing noise. Also, from a practical perspective, I think that this could be more simply solved by using EoT approaches [ext_ref_1, ext_ref_2]. Separate cells over different losses: I understand the reason that led the authors in designing two separate cells. But still, it is not clear to me how and why this should be better than having a single cell with a different loss, such as the TRADES loss which accounts for both clean accuracy and adversarial robustness. I would also have expected an ablation study in this regard. Other issues, not necessarily minor: - The adversarial robustness evaluation standard is now commonly accepted to be AutoAttack, which reduces possible robustness over-estimation. I appreciate that the authors use it, but still their method does not indicate the best robustness with the most relevant attack method among the used ones. - The sentence: "Nevertheless, many approaches still depend on predefined templates and heuristic evaluations (e.g., PGD accuracy), which can introduce statistical noise and obscure robustness-critical factors. This motivates the need for principled, robustness-aware NAS frameworks that dynamically identify sensitivity in operations and architectural choices, enabling the discovery of inherently robust models" does not make much sense to me in general, but it does not also look coherent with what the authors do in practice (eq. 12 is based on PGD). I fail in understanding where the highly principled/non-PGD based evaluation/approach is here. - This sentence: "Architectures not explicitly designed for robustness often exhibit an inherent trade-off between clean accuracy and adversarial performance" should be supported by a reference, as it is once again extremely vague and generic. [ext_ref_1]: Athalye, Anish, Nicholas Carlini, and David Wagner. "Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples." International conference on machine learning. PMLR, 2018. [ext_ref_2]: Pintor, Maura, et al. "Indicators of attack failure: Debugging and improving optimization of adversarial examples." Advances in Neural Information Processing Systems 35 (2022): 23063-23076. - Could you elaborate more concretely on why the ROSE estimator is required? Specifically, what empirical or theoretical evidence supports the claim that adversarial training introduces high-variance or noisy gradients in this NAS context? - What is the main advantage of employing two separate cells (normal and robust) over using a single shared cell trained with a composite objective such as the TRADES loss, which also balances clean and robust accuracy? Can you perform an ablation or comparative study isolating the contribution of the dual-branch design versus adversarial training alone? - You mention that existing approaches rely on heuristic evaluations (e.g., PGD accuracy), whereas your approach is “principled.” Could you clarify in what sense RDNAS avoids such heuristics, given that Eq. (12) itself relies on PGD-generated adversarial examples? - Could you better justify or rephrase the claim that “architectures not explicitly designed for robustness often exhibit an inherent trade-off between clean accuracy and adversarial performance”? A supporting citation or quantitative reference would strengthen this point. - Your method does not achieve the strongest robustness under AutoAttack. Can you comment on this discrepancy and whether your approach might overfit to weaker attacks (e.g., PGD20)?	Moderately AI-edited
RDNAS: Robust Dual-Branch Neural Architecture Search	Soundness: 4: excellent Presentation: 3: good Contribution: 3: good Rating: 8: accept, good paper Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper presents RDNAS, a robust neural architecture search framework that jointly optimizes for clean accuracy and adversarial robustness through a dual-branch cell design and a Robust Outlier-Aware Shapley Estimator (ROSE). The work is well-motivated, experimentally comprehensive, and addresses an important challenge in NAS under adversarial conditions. The empirical results across CIFAR-10/100, SVHN, and Tiny-ImageNet are strong and convincingly demonstrate the robustness–accuracy trade-off. Overall, the paper is technically sound and clearly written, along with some aspects expected to be clarified or extended to further strengthen the contribution and readability. + The dual-branch cell design is novel in the robust NAS community. This design cleanly separates and fuses normal and robust pathways, improving adversarial robustness without significantly enlarging the search space. + The proposed ROSE estimator effectively stabilizes Shapley-based operation scoring under noisy adversarial training. + The proposed RDNAS method achieves strong empirical performance across multiple datasets and attack settings, consistently outperforming hand-crafted and robust NAS baselines in terms of both clean and adversarial accuracy. + RDNAS conducts computationally efficient search with a small-sample strategy and shallow architecture, achieving high performance with low search cost. + The ROSE estimator is a good idea, but the current presentation is somewhat dense. Adding a brief intuitive explanation of how Median-of-Means and IQR filtering improve robustness, perhaps with a small ablation or visualization, would make this component clearer. + In Tables 1–4, consider boldfacing or underlining both the best and second-best results and adding small commentary lines like “RDNAS achieves the best balance between robustness and efficiency.” This helps readers quickly grasp the contribution. + Some symbols (e.g., $\alpha_{e,o}^{(b)}$, $\Delta_{e,o}^{(s,b)}$) appear with slightly inconsistent superscripts or subscripts across sections. Unifying notation between Equations (15)–(19) and the algorithm pseudocode would improve readability. + Briefly mention the approximate search time (e.g., 0.2 GPU Days) in the main text (not only the table). This helps readers appreciate the efficiency advantage compared to typical NAS methods. Please refer to the Weaknesses part.	Moderately AI-edited
RDNAS: Robust Dual-Branch Neural Architecture Search	Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 2: reject Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.	Summary: This paper proposes RDNAS (Robust Dual-Branch Neural Architecture Search), a new framework designed to automatically discover deep neural network architectures that are robust to adversarial attacks while maintaining high accuracy on clean (unperturbed) data. The core problem addressed is that most Neural Architecture Search (NAS) methods optimize for clean accuracy and ignore robustness, while existing robust NAS methods often suffer from unstable training due to noisy gradients from adversarial training. The overall framework integrates adversarial training into a DARTS-like bilevel optimization process and uses ROSE to guide the selection of the final architecture. Experiments on CIFAR-10, CIFAR-100, SVHN, and Tiny-ImageNet show that RDNAS discovers architectures that outperform both manually-designed networks and other state-of-the-art robust NAS methods in both clean accuracy and robustness against attacks like PGD and AutoAttack. Contribution: * A Novel Dual-Branch Cell Design: This architecture explicitly separates and fuses clean and robust feature pathways via an attention mechanism. This is designed to effectively manage the trade-off between clean accuracy and adversarial robustness. * ROSE, a Robust Scoring Mechanism: A new, principled scoring estimator based on Shapley values that is specifically designed to be stable under the noisy conditions of adversarial training by using MoM and IQR statistics. * Strong Empirical Validation: The paper provides extensive experimental evidence that RDNAS discovers architectures with a superior balance of robustness and accuracy compared to existing baselines across multiple datasets and attack types. 1. Principled and Robust Scoring: ROSE is a key strength. Instead of using noisy gradients or standard attributions, it builds on the principled game-theoretic concept of Shapley values and thoughtfully adapts it for a high-variance, adversarial setting using robust statistical estimators. This directly addresses a known failure point in previous robust NAS works. 2. State-of-the-Art Performance: The method achieves excellent results, outperforming a wide range of baselines (e.g., AdvRush, RACL, RobNet) on standard benchmarks. For instance, on CIFAR-10, it achieves high clean accuracy while also delivering top-tier robustness against PGD and AutoAttack. 1. High Complexity: The overall system is complex, combining a bilevel optimization, adversarial training in the inner loop, a custom dual-branch cell with attention, and the sophisticated ROSE estimator (which itself uses Shapley values, MoM, and IQR). This complexity could make reproduction and debugging difficult. 2. New Hyperparameters: The ROSE estimator introduces its own set of hyperparameters, such as the $\beta$ parameter that balances the MoM and IQR scores, the number of MoM groups $G$, and the IQR sensitivity $\gamma$. This adds extra tuning parameters to an already complex search process. 3. Limited Search Space: The method still operates within a conventional cell-based NAS search space, where the set of operations (conv, pool, etc.) is predefined. The work is limited in the cell topology, while creating difficulty for transferring it to more complex transformer based architectures. * Regarding the ROSE Estimator: - The final ROSE score is a weighted sum: $Score_{e,o}^{(b)}=(1-\beta)m_{e,o}^{(b)}+\beta~v_{e,o}^{(b)}$. What is the performance impact of ablating these components? For example, what happens if $\beta=0$ (using only MoM) or $\beta=1$ (using only the IQR outlier score)? - How sensitive is the final discovered architecture to the choice of $\beta$? The paper suggests a range of $[0.3, 0.5]$, does the performance degrade sharply outside this range? * Regarding the Framework and Evaluation: - The search uses a 7-step PGD ($PGD^7$), but the final evaluation includes stronger attacks like $PGD^{20}$, $PGD^{100}$, and AutoAttack. Does this indicate that the architecture found using a weaker attack generalizes well to stronger ones? - How well do the discovered architectures perform against different types of adversarial attacks not included in the main table, such as $l_2$ or $l_0$ norm-bounded attacks?	Fully AI-generated

PreviousPage 1 of 1 (4 total rows)Next