ICLR 2026 - Reviews

SubmissionsReviews

Reviews

Summary Statistics

EditLens Prediction Count Avg Rating Avg Confidence Avg Length (chars)
Fully AI-generated 3 (75%) 2.67 4.00 2910
Heavily AI-edited 1 (25%) 2.00 4.00 3395
Moderately AI-edited 0 (0%) N/A N/A N/A
Lightly AI-edited 0 (0%) N/A N/A N/A
Fully human-written 0 (0%) N/A N/A N/A
Total 4 (100%) 2.50 4.00 3031
Title Ratings Review Text EditLens Prediction
SPADE: SEMANTIC-PRESERVING ADAPTIVE DETOXIFICATION OF IMAGES Soundness: 3: good Presentation: 2: fair Contribution: 2: fair Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. The paper introduces SPADE, a multimodal dataset of real-world toxic images, each paired with three progressively detoxified variants and corresponding stories. It proposes a sequential fine-tuning approach using ControlNet to generate detoxified images that reduce harmful content while preserving semantic context and visual fidelity. Each variant represents a graded reduction in toxicity, guided by captions embedded into narrative stories. Experiments show that Sequential ControlNet outperforms baselines like Stable Diffusion and Safe Diffusion in semantic alignment, content preservation, knowledge retention, and visual realism. SPADE and the method together establish a benchmark for controlled, context-aware, and safety-aligned image generation. - This work is significant because it addresses a pressing safety challenge in text-to-image generation—mitigating toxic content while preserving semantics—which is highly relevant for real-world deployment of these models. The paper has two main contributions - A benchmark for detoxification, enabling future research in safe AI, multimodal moderation, and ethically guided image generation. - A sequential fine-tuning strategy that offers a practical methodology for incremental toxicity reduction in large-scale generative models. - The paper has limited novelty - The dataset is esentially a data-augmented version of real world images and when the model is finetuned in a specific way on this dataset, it becomes safer - IMO, it would be better to present this papers as a post-training dataset for T2i/T2V models rather than presenting it as a new task and dataset since it is esstially a finetuning data - The images are essentially decomposing the task into sequentially easier tasks by adding a graded version. - The paper does not compare against other methods for safe T2I generation like [1, 2] [1] Gong, Chao, et al. "Reliable and efficient concept erasure of text-to-image diffusion models." European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024. [2] Yoon, Jaehong, et al. "SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation." The Thirteenth International Conference on Learning Representations. - How sensitive is the model to the order of variant fine-tuning? Could starting from ​V3 down to V1 affect semantic preservation differently? - The error modes (hallucination, contextual drift, instruction misinterpretation, artifacts) are discussed. Can the authors quantify the frequency or severity of these errors across variants? Fully AI-generated
SPADE: SEMANTIC-PRESERVING ADAPTIVE DETOXIFICATION OF IMAGES Soundness: 4: excellent Presentation: 3: good Contribution: 4: excellent Rating: 4: marginally below the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. This paper introduces SPADE, a novel dataset and benchmark for graded, semantic-preserving detoxification of harmful images. Unlike prior binary moderation or red-teaming approaches, SPADE pairs each toxic image with three progressively detoxified variants, along with captions and contextual stories generated via GPT-4 and DALL-E 3. The authors further fine-tune ControlNet using a sequential image-to-image conditioning strategy, allowing gradual reduction of toxicity while preserving semantic and visual coherence. Quantitative evaluation (FID, CLIP similarity, KR/CP metrics) and t-SNE visualizations demonstrate that the proposed Sequential ControlNet surpasses strong baselines (Safe Diffusion, InstructPix2Pix, etc.) in balancing safety and semantics. The paper also provides a thorough dataset analysis, error taxonomy, and ethical statement. 1. The paper defines a new and concrete research direction—graded image detoxification—that goes beyond existing binary filtering or red-teaming settings. The notion of progressively reducing toxicity while maintaining semantic coherence is both practically and conceptually valuable. 2. The SPADE dataset is carefully constructed and well-documented. The integration of captions, contextual stories, and multiple detoxified variants per image demonstrates attention to detail and awareness of real-world moderation needs. 3. The proposed sequential fine-tuning of ControlNet is well-motivated and systematically described. The design choice of conditioning on both the toxic image and story context provides a clear, interpretable mechanism for controlled detoxification. 4. The authors employ a wide range of quantitative and qualitative assessments—CLIP similarity, FID, KR/CP metrics, t-SNE analysis, and error taxonomy—which together give a convincing picture of model behavior. 5. The topic sits at the intersection of safety, multimodal learning, and generative modeling—areas of increasing importance for the ICLR community. The work aligns well with ongoing discussions around ethical content generation and interpretability. 1. The dataset and baseline generation rely heavily on proprietary models (GPT-4, DALL-E 3), making full reproduction difficult once APIs change or become restricted. 2. While the paper reports manual checks, the human study is limited in scale and does not quantify inter-annotator agreement or perception consistency across toxicity categories. 3. The sequential fine-tuning strategy is central to the paper, but no direct ablation isolates its contribution relative to single-stage fine-tuning or other conditioning mechanisms. 4. Although the curation process is systematic, the dataset size (≈2.5 k toxic images) and domain diversity may be insufficient to claim broad generalization across harm types or cultural contexts. 5. The paper frequently refers to balancing detoxification and semantic preservation, but lacks a clear quantitative or perceptual measure of this trade-off. This weakens the empirical grounding of some claims. 1. How are copyright and model license restrictions (DALL-E 3, GPT-4) handled for open-sourcing SPADE? 2. Could the authors provide results using open-source models (e.g., SDXL + PixArt) to ensure replicability? 3. Did annotators evaluate toxicity perception by humans, or only semantic preservation? 4. How sensitive is Sequential ControlNet to domain shift—e.g., cartoon, medical, or political imagery? Fully AI-generated
SPADE: SEMANTIC-PRESERVING ADAPTIVE DETOXIFICATION OF IMAGES Soundness: 2: fair Presentation: 3: good Contribution: 2: fair Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. This paper introduces SPADE, a new dataset and benchmark for semantic-preserving, graded detoxification of toxic visual content. SPADE includes 2500 toxic images with three detoxified versions each and is positioned as the first benchmark for controllable, story-guided visual detoxification. They fine-tune ControlNet in a sequential conditioning framework to produce graded detoxification, and evaluate results using both automatic metrics (FID, CLIP similarity) and human-aligned measures (Knowledge Relevance, Context Preservation). The paper formalizes the previously underexplored task of graded image detoxification, highlighting the fundamental trade-off between harm reduction and semantic fidelity. This conceptual framing is both strong and timely, as it extends the current focus on binary content moderation toward a more nuanced, controllable, and ethically grounded generation process. 1. The dataset size (2,500 base toxic images) may be too small to serve as a foundation benchmark, especially when split across several harm categories. It would be useful if the authors could provide category distribution statistics or discuss scalability to larger or more diverse sources. 2. All detoxified variants are synthetic (via DALL-E 3), which raises concerns about realism and distributional shift. Some variants (e.g., V3) appear cartoonish, reducing their usefulness for realistic downstream fine-tuning. Clarifying whether realism-preserving metrics (e.g., LPIPS, human ratings) were considered would strengthen this part. 3. The Sequential ControlNet fine-tuning pipeline seems incremental over existing diffusion control techniques. The main novelty lies in the dataset rather than in algorithmic innovation. It would help if the authors could elaborate on how the sequential fine-tuning differs conceptually or experimentally from prior progressive control methods (e.g., T2I-Adapter, Make-A-Scene). 1. Since GPT-4 and DALL-E 3 are used to produce captions, stories, detoxified images, and even automated evaluations (KR, CP), how do the authors mitigate potential self-bias or circularity? 2. The Sequential ControlNet approach seems to perform curriculum-like fine-tuning. How does it differ, conceptually or technically, from prior progressive fine-tuning or adapter-based control (e.g., T2I-Adapter, Make-A-Scene)? 3. In several detoxified variants (especially V3), the visual style appears to shift from photorealistic to cartoon-like or over-smoothed renderings. Could the authors clarify why such stylistic drift occurs during detoxification, and whether any mechanisms were used to preserve realism? How might future work mitigate this distributional shift between real and detoxified images? Fully AI-generated
SPADE: SEMANTIC-PRESERVING ADAPTIVE DETOXIFICATION OF IMAGES Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. This paper introduces SPADE, a novel dataset for graded, story-guided image detoxification, comprising 2,500 toxic images paired with three progressively detoxified variants, human-aligned captions, and contextual narratives. It proposes Sequential ControlNet as a baseline, demonstrating that multimodal (image + story) conditioning can reduce toxicity while preserving semantic and narrative coherence. 1. The paper introduces SPADE, the first dataset explicitly designed for graded, story-guided image detoxification, filling an important gap in multimodal safety research. 2. The t-SNE and human-aligned metric analysis provide a thoughtful framework for evaluating the trade-off between toxicity reduction and semantic preservation. 1. Lack of methodological innovation. The proposed Sequential ControlNet appears to be an engineering-level combination rather than a genuine conceptual contribution. Although the paper claims to introduce a new baseline, its core idea merely involves stage-wise fine-tuning of ControlNet without introducing any new architecture, loss function, or training strategy. Moreover, the comparative experiments fail to include strong and reasonable baselines (as mentioned in Appendix A). 2. Dataset construction heavily depends on black-box APIs. All 7,500 detoxified images are automatically generated by DALL·E 3, yet the paper provides no clear description of prompt control or generation details (Appendix C only shows vague examples without a reproducible batch-generation protocol). The corresponding captions and stories are generated by GPT-4, but the authors do not disclose the temperature, top-p, or system prompt settings, nor do they assess whether GPT-4’s own toxicity or bias may have leaked into the dataset. 3. Toxicity quantification is vague and unreliable. The paper reports toxicity scores (Toxic = 79.7, V1 = 73.4, V2 = 54.6, V3 = 35.8) but only provides a footnote stating: “Calculated using a CLIP classifier fine-tuned on the dataset from Hendrycks et al., 2021.” However, Hendrycks et al. (2021) is a value-learning text dataset without image-level toxicity annotations, raising concerns about the validity of these toxicity metrics. 4. Core claim of “graded detoxification” lacks controllability verification. Although the paper emphasizes that the detoxification process is graded and controllable, it does not verify whether users can interpolate between toxicity levels. Furthermore, Table 2 shows that caption cosine similarity drops from 1.00 (R) to 0.60 (V3), yet no semantic analysis is provided to explain what content is lost. In reality, V1/V2/V3 represent discrete snapshots rather than a continuous, controllable space, making the approach unsuitable for real-world moderation scenarios that require dynamically adjustable safety grades (as claimed in the Abstract). 5. Ethical risks are understated, overlooking cultural sensitivity and power asymmetry. (1) The paper implicitly assumes “detoxification = positive,” without discussing who defines toxicity. For example, the keyword list (Appendix B.1) includes phrases such as “Fat-shamed girl” and “Obese person,” which reflect a body-shaming perspective and treat “fatness” itself as toxic. (2) In Table 7, the prompt “A student being bullied” is detoxified into “accidentally bumped into a locker,” which trivializes actual violence and risks secondary harm. None. Heavily AI-edited
PreviousPage 1 of 1 (4 total rows)Next