ICLR 2026 - Reviews

Submissions Reviews

Reviews

EditLens Prediction: Fully AI-generated Heavily AI-edited Moderately AI-edited Lightly AI-edited Fully human-written All

Rating: 1 2 3 4 5 6 7 8 9 10 All

Confidence: 1 2 3 4 5 All

Summary Statistics

EditLens Prediction	Count	Avg Rating	Avg Confidence	Avg Length (chars)
Fully AI-generated	2 (67%)	5.00	4.00	3359
Heavily AI-edited	0 (0%)	N/A	N/A	N/A
Moderately AI-edited	1 (33%)	4.00	3.00	2262
Lightly AI-edited	0 (0%)	N/A	N/A	N/A
Fully human-written	0 (0%)	N/A	N/A	N/A
Total	3 (100%)	4.67	3.67	2993

Title	Ratings	Review Text	EditLens Prediction
Semantic Calibration in Media Streams	Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper argues that instead of merely detecting whether media content is synthetic or not, one should directly calibrate its semantic distribution to mitigate potential semantic deception. The authors theoretically demonstrate that conventional deepfake detection methods based on non-semantic artifacts will ultimately fail as generative models approach perfection. Accordingly, they propose a Semantic Calibration framework, which employs captioning models and language models to perform rejection sampling in the semantic space, thereby enabling cross-modal content filtering. 1.The authors are the first to formally define semantic deception and introduce a distribution-level metric for it, providing a perspective that transcends the traditional binary paradigm of authenticity detection. 2.The proposed pipeline—extracting semantics via captioning models, estimating semantic distribution ratios with two language models, and performing rejection sampling—is concise, transparent, and interpretable through token-level saliency analysis. 1.The image experiments are conducted only on COCO, CIFAR-10, CIFAR-100, and ImageNet. It is recommended that the paper include evaluations on datasets more representative of security-related applications, such as those in the deepfake or AIGC-generated content detection domains. 2.In practical scenarios, a single image may correspond to multiple valid descriptions (e.g., “a man playing guitar” vs. “a musician performing on stage”), and a text segment may have several semantically equivalent paraphrases. Such multi-description diversity introduces representational ambiguity in the semantic space, which may cause the model to misinterpret natural linguistic variability as semantic shift or deception. 3 the method assumes captions are an (almost) lossless stand-in for semantics (aiming for 𝐻(𝑍∣𝑍^)=0) and then operates entirely in caption space; this is acknowledged but not validated rigorously, and failures of the captioner (omissions, hallucinations, bias) directly impact decisions. 4 By design the method may pass factually wrong but distribution-typical content, and reject surprising yet true items; this limits suitability for many moderation goals and shifts risk to how 𝑝𝑟 is chosen. 1. How often does 𝐻(𝑍∣𝑍^)materially deviate from zero in practice? Do you have end-to-end ablations showing deception-reduction vs. caption quality (e.g., swapping Florence with a weaker/stronger captioner)? 2. If an attacker crafts captions (or ASR prompts) to mimic 𝑝𝑟(𝑧^), can they evade calibration? Any defenses beyond top-ρ filtering? 3.Your experiments mostly reweight label distributions. How does the method handle semantic recombination (e.g., rare object–context pairs) or multi-label captions? 4.Can the framework extend to video (temporal narratives) or mixed media posts where text and image semantics conflict? 5 In the “disjoint support” case you match OOD detection and report near-perfect scores on misinformation datasets; how realistic is this assumption outside curated text benchmarks?	Fully AI-generated
Semantic Calibration in Media Streams	Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.	This paper proposes semantic calibration, a novel framework that shifts deepfake and misinformation detection from artifact-based identification to semantic distribution alignment. Instead of distinguishing real versus synthetic content, the method identifies deceptive shifts in semantic information. It employs captioning models to convert multimodal inputs into text, trains two language models to estimate real versus mixed semantic distributions, and applies a likelihood-ratio–based rejection sampling rule to filter deceptive media. 1. The work introduces a distributional view of deception, formalizing it via KL divergence between semantic distributions, and rigorously establishes the limitations of traditional deepfake detection under improved generation quality. This theoretical foundation is both timely and intellectually solid. 2. The proposed semantic calibration offering transparency rarely seen in media integrity research. 1. Experiments are limited to classical datasets (CIFAR, COCO, AG-News, UrbanSound8K, etc.) and artificially simulated shifts, lacking tests on realistic AI-generated or manipulated content, such as GenImage [1], DeepfakeBench [2], Loki [3]. 2. No quantitative or qualitative comparison is provided against advanced AI-generated content detection methods such as HAMMER (multi-modal detection) [4] or UniFD (image detection) [5], limiting the evaluation’s competitiveness. 3. The framework relies heavily on captioners or semantic encoders to extract text representations. Any bias, hallucination, or semantic drift in these upstream models directly impacts calibration reliability, but this sensitivity is not quantitatively analyzed. [1] Genimage: A million-scale benchmark for detecting ai-generated image, NeurIPS 2023. [2] Deepfakebench: A comprehensive benchmark of deepfake detection, NeurIPS 2024. [3] Loki: A comprehensive synthetic data detection benchmark using large multimodal models, ICLR 2025. [4] Detecting and grounding multi-modal media manipulation, CVPR 2023. [5] Towards universal fake image detectors that generalize across generative models, CVPR 2023. 4. Details such as model backbone choices, parameter counts, captioning prompts, and training hyperparameters are insufficiently documented.	Moderately AI-edited
Semantic Calibration in Media Streams	Soundness: 3: good Presentation: 3: good Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	The paper introduces Semantic Calibration, a theoretical and practical framework that redefines the problem of deepfake detection. Instead of distinguishing real vs. synthetic media through low-level artifacts, the authors argue that the true objective is to reduce semantic deception, which is a distributional distortions in the meaning conveyed by media streams. The paper proves that artifact-based deepfake detection will eventually fail as generative models approach perfection. It then formalizes deception as the KL divergence between the semantic distribution of observed media and that of real data, and finally proposes a modality-agnostic mitigation strategy: converting media to text via captioning, then filtering samples using rejection sampling based on semantic likelihood ratios derived from two fine-tuned LLMs. Extensive experiments across text, image, and audio modalities show consistent reductions in semantic deception and strong explainability via token-level saliency maps. The method is transparent and empirically effective in aligning media semantics with real-world distributions. 1. The paper makes a paradigm shift from authenticity detection to semantic distribution alignment from simple binary deepfake detection. The notion that misinformation should be treated as a semantic calibration problem is both novel and timely. This framing may become foundational for next-generation media integrity systems as artifact cues disappear 2. The authors formally derive performance bounds (Theorem 1) showing the inevitability of deepfake detection failure under improving generators. The theoretical link between deception and achievable detection accuracy is rigorous and motivates the need for semantic methods 3. The proposed rejection sampling in semantic space is mathematically clean and interpretable. Using captioning models and LLM likelihood ratios provides explainability, which is a key advantage over opaque moderation algorithms. The token level saliency analysis demonstrates further interpretability. 1. Because semantics are extracted via pretrained captioners (e.g., Qwen-Audio), calibration accuracy inherits their biases and failure modes. Although the authors discuss this limitation, no robustness experiments are shown under noisy or adversarial captions. 2. While experiments simulate semantic shifts via reweighted class distributions, these synthetic setups might be simplified compared to real-world misinformation, which is dynamic, adversarial, and context-dependent. Demonstrating the method on real social media streams or misinformation datasets (beyond tabular datasets) may strengthen the claim of practical viability. 3. The approach depends critically on a "trusted" dataset to model the real semantic distribution. As the authors acknowledge, this assumption is strong, where biases or incompleteness in the trusted dataset directly propagate to moderation outcomes. The paper does not provide strategies for ensuring fairness or robustness of the dataset itself. 1. How would semantic calibration adapt to evolving media semantics (e.g., breaking news or new slang)? Would continual retrainingbe required to maintain the semantic distribution? 2. Given a threat model where an attacker deliberately craft content semantically close to the distribution but factually false, can calibration still filter them? 3. How feasible is deploying semantic calibration as a real-time moderation layer given captioning and LLM inference overhead? 4. Could semantic calibration be viewed as a distributional analogue of LLM alignment (e.g., minimizing semantic divergence instead of reward loss)?	Fully AI-generated

PreviousPage 1 of 1 (3 total rows)Next