|
Catch-Only-One: Non-Transferable Examples for Model-Specific Authorization |
Soundness: 3: good
Presentation: 3: good
Contribution: 3: good
Rating: 6: marginally above the acceptance threshold
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
This paper proposes the method of non-transferable examples which tackles the problem of model-specific data authorization. The author establish formal bounds with solid theorical analysis, and prove the effectiveness of the method using strong empirical results.
- I really like the idea of "NEs leverage a structural property of neural networks in which many input directions have negligible effect on early features, yielding a model-specific set of insensitivity directions that rarely align across models." It is a great observation that worth leaveraging in this problem domain.
- The experiment results show a significant performance on protecting the encoded data, it achieve protection without retraining or expensive encryption.
- This paper provides solid theoreticals analysis with the bounds for both authorized and unauthorized performance, which is mathematically sound and adequately discussed.
- It would be better if more visual examples can be provided rather than just 2 images.
- I have several more questions which need author's further clarification regarding the applicability/limitation/etc. Please see [Questions].
- What's the benefits of this method compared with simplying add a unified gaussian noise mask to the data (substract the mask when inference)?
- I am a bit concerned on how this method will eventually change the image. As you can see in Figure 1, when ~ 40dB , this method will largely affect visual quality of the image and makes it hard to be identified by human . In this case, how you are going to use this method in the real applications? can you provide example scenarios where we do not care about what the image looks like after applying your method and only cares about the inference on a specific model?
- Can this method be applied across the models? i.e., the data owner allows the NES be used by different clients.
- Are there any approach that could undermine this method? For example, when an attack collected some data with the correct labels, are they able to tune their models' initial layers to adapt this method?
- What's the computation overhead of this method?
- What's the limitation of the method? what kind of model cannot be applied to this method? |
Fully human-written |
|
Catch-Only-One: Non-Transferable Examples for Model-Specific Authorization |
Soundness: 3: good
Presentation: 3: good
Contribution: 3: good
Rating: 6: marginally above the acceptance threshold
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
The paper proposes non-transferable examples, a training-free, input-side recoding that preserves utility for a single authorized model while sharply degrading utility for any other model. The key idea is to add a calibrated perturbation within a model-specific low-sensitivity subspace of the target model’s first linear map, estimated via SVD. Theoretically, the paper provides bounds linking authorized retention to a spectral threshold and unauthorized degradation to cross-model spectral misalignment via Hoffman–Wielandt inequality. Experiments on CIFAR-10 and ImageNet across multiple vision backbones and on VLMs demonstrate strong model performance.
- The paper tackles a novel and societally relevant problem: enforcing model-level usage control without retraining or cryptographic infrastructure. Unlike anti-learnability or differential privacy approaches, the proposed method operate directly at inference and do not require access to non-target models, representing a new protection paradigm.
- The method is mathematically grounded, with clear theoretical analysis connecting spectral properties to performance retention/degradation through bounded inequalities.
- Experiments span both vision backbones and multimodal VLMs, with consistent evidence of non-transferability.
- The analysis focuses solely on the first-layer linear map of the network, assuming subsequent layers implicitly preserve the property. This simplification may not hold for architectures with highly nonlinear early blocks or skip connections (e.g., ResNet).
- The adversary is allowed preprocessing, such as adaptive adversary, but the paper does not evaluate adversaries that learn an inversion (e.g., distilling a new first layer aligned to $V$). The empirical reconstruction attempts are classical denoising and do not include learned inversion with supervision on even a small clean subset.
- Some comparisons (e.g., with FHE) rely on cited rather than reproduced results, and differential privacy is known to address a different threat model.
- How robust is the non-transferability property when unauthorized models share partial architecture?
- Whether the low-sensitivity subspace remains stable under data-domain shifts (e.g., different ImageNet subclasses or out-of-distribution inputs)?
- How does the computational cost of subspace estimation scale with model dimension, and can it be applied to large models such as VLMs? |
Fully AI-generated |
|
Catch-Only-One: Non-Transferable Examples for Model-Specific Authorization |
Soundness: 3: good
Presentation: 3: good
Contribution: 3: good
Rating: 6: marginally above the acceptance threshold
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
This paper introduces Non-Transferable Examples (NEs) — a training-free, input-side method for model-specific data authorization. By analyzing the first-layer weights of the target model via singular value decomposition, it recodes inputs within a low-sensitivity subspace so that the authorized model’s performance is preserved while unauthorized models experience severe degradation. Theoretical analysis links this effect to spectral misalignment, and experiments on vision and multimodal tasks show that NEs retain authorized accuracy while rendering other models unusable.
1. The paper introduces a new perspective on data authorization by proposing Non-Transferable Examples (NEs)—a lightweight, training-free method that ensures data usability is restricted to a specific target model. This formulation shifts the control from model training or encryption to input-level encoding, representing a genuinely novel conceptual contribution.
2. The proposed approach is simple, efficient, and broadly applicable. Since NEs only require access to the first-layer weights of the target model, they can be deployed across diverse architectures, including CNNs, Vision Transformers, and vision-language models, without retraining. This makes the method attractive for real-world deployment.
3. The authors support their approach with a solid mathematical analysis. By leveraging spectral theory and the Hoffman–Wielandt inequality, they rigorously show why the encoded data maintain performance on the authorized model but degrade substantially on others due to subspace misalignment. The theoretical framework aligns well with empirical findings.
1. If many encoded samples are publicly available and generated under a consistent subspace or seed, an adversary could apply PCA or covariance-based spectral analysis to infer correlated energy patterns and approximate the manipulated subspace, partially neutralizing the encoding and restoring unauthorized performance.
2. NE is an input-side, inference-stage defense that becomes ineffective when large-scale datasets are directly used for training or fine-tuning. Attackers can adapt by learning on re-encoded data or mixing it with original samples.
3. Because NE encoding depends on precise spectral alignment with the target model’s first-layer subspace, common transformations—compression, resizing, cropping, or illumination shifts—may distort the encoded signals and weaken the authorization effect. Robustness under realistic noise and preprocessing variations remains an open issue.
Please see weakness |
Fully AI-generated |
|
Catch-Only-One: Non-Transferable Examples for Model-Specific Authorization |
Soundness: 4: excellent
Presentation: 3: good
Contribution: 4: excellent
Rating: 6: marginally above the acceptance threshold
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
This paper introduces NEs, a data-centric designed method for model-specific authorization in machine learning systems.
The idea is to recode data inputs via perturbations aligned with the target model's low-sensitivity subspace, making an authorized model to maintain high performance while drastically degrading utility for any unauthorized models.
No model retraining is needed. NEs are also agnostic to the target architecture and operates solely at the input side. Both theoretical guarantees and comprehensive empirical evaluation are provided demonstrating its effectiveness.
- The problem is somewhat novel and interesting to me. It could address a pressing gap in AI governance in a practical and computationally feasible way, without retraining or cryptographic overhead.
- The empirical validation is comprehensive across many vision and VLM models
- Theoretical guarantees only cover the first linear layer; deeper nonlinear effects and adversarial undoing are unexplored. How do perturbations propagate through deeper nonlinear layers, and can adaptive adversaries undo them?
- Hyperparameters for perturbation magnitude seems not to be well discussed and may not generalize across models or data. Can a method for systematic hyperparameter calibration be developed for different settings?
- My concern is that if NEs resist sanitization/purification/denoising techniques. If they are not, attacker can just purify NEs before feeding to the unauthorized models
- Generating NEs might take some time while runtime and computational cost to generate NEs are not well discussed. How practical is NE generation for real-time or large-scale use?
- No evaluation in high-stakes or privacy-sensitive real-world domains. How effective and applicable are NEs in critical privacy-focused applications (e.g., heath care)?
See Weaknesses |
Lightly AI-edited |