ICLR 2026 - Reviews

SubmissionsReviews

Reviews

Summary Statistics

EditLens Prediction Count Avg Rating Avg Confidence Avg Length (chars)
Fully AI-generated 1 (25%) 6.00 3.00 4086
Heavily AI-edited 0 (0%) N/A N/A N/A
Moderately AI-edited 0 (0%) N/A N/A N/A
Lightly AI-edited 1 (25%) 6.00 3.00 2921
Fully human-written 2 (50%) 3.00 4.00 2608
Total 4 (100%) 4.50 3.50 3056
Title Ratings Review Text EditLens Prediction
Exploiting Fine-Tuning Structures to Improve Adversarial Transferability on Downstream SAM Soundness: 2: fair Presentation: 3: good Contribution: 2: fair Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. The paper proposes a method to generate adversarial inputs to the open weight SAM segmentation model in such a way so that the examples transfer also to fine-tuned versions of SAM, without having information about the dataset used for fine-tuning, or the fine-tuned weights. The main idea is based on two pillars: first, a set of "ghost" models is created that contains "randomly fine-tuned" models (meaning that the fine-tuning method is used (eg lora) but the actual perturbations made are random (as opposed to being learned). This set of ghost models is then attacked using a longitudinal ensemble attack (attacking with an iterative method but using only a small subset of the models in each iteration). The method is demonstrated to work well empirically, and some theoretical results are also presented in its support. The paper proposes a method that can be demonstrated to provide favorable results compared to related work. The targeted problem is interesting, if adversarial examples transfer to fine-tuned variants of foundation models then open weight models represent a vulnerability in this sense. The novelty of the paper is rather minimal in terms of the approach itself. Both the idea of using ghost models and longitudinal attacks are known from the literature, this approach is applied in the fine-tuning setup. The presentation of the paper, especially the formal parts (equation) have many issues, starting with eq (1) which is a very unusual (in fact, incorrect) way of defining the learning problem (should be a minimization of the sum of sample losses or the expectation of the loss), eq (2) is both meaningless and wrong (there is no requirement of "much larger" in general), eq (4) is announced as a parameter update, which it is not (it is the definition of the optimization problem, extended with an initialization statement (that is not needed), definition 1 is quite strange (in eq (5) the "L=" is not needed, also, in the previous sentence x' is defined, but then used as a variable in the eq, eq (6) uses the notation $\Delta W$ which previously was used to denote the changes in the parameters, and immediately used in (6) as the change in the output, Theorem 1 seems not to be true, and the proof contains basic issues that make the bounding incorrect. The theorem is formulated in such a way that it should hold for any loss $L$, but essentially only the linear case is discussed, as the first step is introducing a linear approximation. The approximation of the expectation of the cosine function is not justified because the numerator and denominator are not independent. But the theorem is quite clearly not true even for the linear case, simply due to symmetry: in the linear case the gradient of the ghost set cannot be closer to that of *every* possible fine-tuning than to that of the original model, while the claim supposed to hold for every $f_d$. The method works, but for an entirely different, in fact, opposite reason. The random ghost set makes sure that the attack focuses on those directions that lead to adversarial inputs *independently* of the fine-tuning. In other words, the ghost models help *ignore* the fine tuning bits and help focus on the core bits: those directions that are common in the original and fine-tuned model, hence the transferability. Can you please revise the theoretical discussion (or remove it completely if that is not possible)? Fully human-written
Exploiting Fine-Tuning Structures to Improve Adversarial Transferability on Downstream SAM Soundness: 3: good Presentation: 2: fair Contribution: 2: fair Rating: 6: marginally above the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This paper investigates the adversarial transferability between SAM and its downstream fine-tuned variants. The authors introduce SETA, a method that exploits knowledge of the fine-tuning structures of downstream models to enhance transferability. Specifically, SETA constructs ghost SAMs that emulate possible fine-tuned variants by injecting sampled PEFT parameters, and employs a longitudinal ensemble strategy to iteratively craft more transferable perturbations. Experimental results demonstrate that SETA consistently outperforms established baselines such as MI-FGSM, PGN, GhostNet, and MUI-GRAT. Moreover, theoretical analysis and loss-surface visualizations substantiate the claim that structural awareness fosters better gradient alignment, thereby improving cross-model transferability. 1. This paper is among the first to take a practical perspective by explicitly modeling downstream fine-tuning structures to improve the adversarial transferability of SAM in domain-specific scenarios. 2. The idea of constructing "ghost SAMs" through sampling PEFT parameters is both novel and insightful. It extends the ghost network concept in a structure-aware manner that effectively captures downstream adaptation mechanisms. 3. Extensive experiments across four distinct SAM variants consistently demonstrate substantial improvements in attack performance. 4. The paper is well-organized and thorough, featuring detailed algorithmic pseudocode, comprehensive ablation studies (e.g., distribution types, hyperparameters, and number of ghost SAMs), and thoughtful discussions on limitations. 1. While the paper claims to operate within the same context as [1], it lacks a thorough comparison with that baseline and does not clearly articulate the distinct problem formulation or contribution beyond the prior work. 2. Table 4 indicates that SETA outperforms competing methods even when (k = 1). However, the paper does not examine whether SETA would maintain this advantage if other methods were also permitted to ensemble multiple open-source models, which could affect the fairness of the comparison. 3. The motivation for adopting Gaussian sampling could be further clarified, as Table 3 suggests that different parameter distribution types lead to comparable performance. 4. Most downstream models evaluated in the experiments belong to the medical domain, and the improvement observed on COD is relatively marginal. A deeper analysis of this phenomenon would strengthen the paper. Additionally, including more diverse downstream models (e.g., shadow segmentation or other non-medical tasks) would help validate the generality of the proposed approach. [1] Song Xia, Wenhan Yang, Yi Yu, Xun Lin, Henghui Ding, Lingyu Duan, and Xudong Jiang. Transferable adversarial attacks on SAM and its downstream models. arXiv preprint arXiv:2410.20197, 2024. Please answer the questions proposed in the weakness section. Lightly AI-edited
Exploiting Fine-Tuning Structures to Improve Adversarial Transferability on Downstream SAM Soundness: 2: fair Presentation: 3: good Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This paper presents the adversarial attack method SETA for downstream fine-tuning of the SAM model. The authors‘ approach is ingenious. By leveraging fine-tuning structures to construct "ghost models" and combining with a longitudinal integration strategy, the attack's transferability is enhanced. The paper has a clear structure and a well-defined problem definition, making it easy for readers to understand. However, the experiments were only conducted under similar fine-tuning scales and did not verify their effectiveness in more complex or defensive conditions. Overall, this research idea is novel and provides valuable inspiration for structure-aware adversarial attacks. This paper is written clearly and concisely with a rigorous structure. The author provides an exact problem definition and symbol explanation at the beginning, enabling readers to quickly grasp the main research framework and hypotheses, which greatly enhances the clarity and logical coherence of the paper. Furthermore, the proposed SETA method is conceptually innovative. It ingeniously utilizes the structural characteristics of downstream fine-tuning to construct a "ghost SAM" and combines a longitudinal integration strategy to improve transferability. Although the SETA method proposed in this paper is innovative and has achieved good experimental results, it still has some shortcomings. Firstly, the method assumes that the downstream LoRA/Adapter parameters follow a Gaussian distribution, but there is a lack of empirical verification. The rationality of this assumption under different tasks remains uncertain. Secondly, SETA relies on the known fine-tuning structure and is applicable to PEFT models, but its generalization to full-parameter or prompt fine-tuning scenarios is limited. The experimental part mainly focuses on similar fine-tuning scales and does not analyze the performance under different fine-tuning intensities, structural differences, or defense mechanisms. Moreover, the independent contributions of the Ghost SAM and the longitudinal integration parts have not been evaluated separately, and the source of performance improvement is not clear. Overall, the main limitations of the paper lie in insufficient verification of theoretical assumptions, narrow experimental scope, and insufficient robustness analysis. 1. The author assumes that the parameters of LoRA or Adapter in the downstream SAM model can be approximately represented by a Gaussian distribution, and based on this, "Ghost SAM" was constructed in theoretical analysis and experiments. Although this assumption makes the method simpler and the modeling more convenient, the paper does not provide empirical evidence to prove whether this Gaussian distribution assumption is truly valid. In fact, different tasks, LoRA configurations, and optimization settings will lead to different distributions of these parameters, so the universality of this assumption in different fine-tuning scenarios remains uncertain. 2. The paper introduces a longitudinal ensemble strategy to enhance the stability of adversarial attacks. However, the experiments do not include an independent comparison between using only Ghost SAMs without the ensemble and using only the ensemble strategy without constructing Ghost SAMs. As a result, it isn't easy to attribute the performance gains to either component clearly. This makes the contribution of the longitudinal ensemble module somewhat ambiguous and leaves the overall effect of each part insufficiently validated. 3. The paper mainly evaluates the attack performance in terms of success rate and performance degradation, but does not investigate the robustness of the proposed attack under input perturbations or common defense mechanisms such as adversarial training. 4. The experiments only cover downstream models with similar LoRA/Adapter configurations. It is not clear whether SETA is still effective when the fine-tuning intensity is stronger or the structure is different, which raises questions about its generalization ability in different fine-tuning scales. Fully AI-generated
Exploiting Fine-Tuning Structures to Improve Adversarial Transferability on Downstream SAM Soundness: 3: good Presentation: 3: good Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. This paper studies black-box adversarial transferability from the original Segment Anything Model (SAM) to its downstream fine-tuned variants under a parameter-efficient fine-tuning (PEFT) setting. The authors propose SETA, which (1) constructs “ghost” SAMs by injecting sampled PEFT layers (e.g., LoRA or adapters) into the base encoder, and (2) performs a longitudinal ensemble attack by randomly selecting a subset of these ghost models at each iteration. Empirical evaluation against three downstream SAM variants shows that SETA outperforms MI-FGSM, Ghost Networks, PGN, and MUI-GRAT in terms of feature-distance loss. Strengths: 1. The paper is well-structured and easy to follow. 2. A comprehensive empirical study demonstrates the effectiveness of the proposed method. 1. The attack presumes exact access to the downstream PEFT configuration (insertion points, ranks, etc.), which may not hold in realistic black-box settings where fine-tuning details are proprietary or obfuscated. 2. The selection of ghost parameters $\Delta W$is based on random initialization (sampled from a Gaussian); there is no theoretical analysis or guidance on choosing better parameters. 3. All experiments use variants of the ViT-based SAM. The transferability of SETA across different vision transformers (ViT-H or L) remains unexplored. 4. Adapters are also a common way of fine-tuning SAM, whereas the methodology in this paper focuses on LoRA design and lacks a discussion of adapters. 5. Given that adversarial attacks and defenses are two closely related areas, it would be desirable to add a discussion of adversarial defenses to the relevant work section. My major concerns are weaknesses 2, 3, 4, and 5. I would like to raise my score if the author can solve those concerns during the rebuttal. Fully human-written
PreviousPage 1 of 1 (4 total rows)Next