ICLR 2026 - Reviews

Submissions Reviews

Reviews

EditLens Prediction: Fully AI-generated Heavily AI-edited Moderately AI-edited Lightly AI-edited Fully human-written All

Rating: 1 2 3 4 5 6 7 8 9 10 All

Confidence: 1 2 3 4 5 All

Summary Statistics

EditLens Prediction	Count	Avg Rating	Avg Confidence	Avg Length (chars)
Fully AI-generated	0 (0%)	N/A	N/A	N/A
Heavily AI-edited	1 (25%)	2.00	5.00	1857
Moderately AI-edited	0 (0%)	N/A	N/A	N/A
Lightly AI-edited	1 (25%)	4.00	4.00	1460
Fully human-written	2 (50%)	2.00	4.50	1460
Total	4 (100%)	2.50	4.50	1559

Title	Ratings	Review Text	EditLens Prediction
LU-500: A Logo Benchmark for Concept Unlearning	Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper introduces a benchmark for logo unlearning. It focuses on evaluating inference time unlearning methods on this benchmark, and proposes a baseline unlearning method based on prompting editing. Through experiments, it shows that existing inference-time unlearning methods are not effective in unlearning logos. The presentation of this paper is generally clear. And it provides an interesting correlation analysis between unlearning performance and various logo characteristics (area, location, edge density, etc.). This paper has the following major limitations: - The scope of copyright protection is limited. The exclusive focus on logos is too narrow for meaningful copyright protection. Other crucial copyrighted elements may include characters, protected artworks and patterns, and so on. The methods may not generalize to other types of copyrighted content - The sole focus on inference-time unlearning methods is a major limitation because it does not represent the full spectrum of unlearning approaches. There is no explanation for why other unlearning approaches wouldn’t work. Fine-tuning methods and model manipulation unlearning should be compared with even if they might be more computationally demanding. - The proposed baseline shows fundamental flaws. As we can see from Figure 6 that residual logos remain clear and brand identities are recognizable even if logos are partially removed. And it may not work if implicit brand indicators beyond logos are presented in the prompt. - The current metric does not guarantee complete information removal. It does not test against adversarial attempts to recover logos. The front page image needs some work, the layout is cluttered and does not clearly show the main information. Why focus exclusively on logos rather than a broader range of copyrighted visual content? Have you tested whether your benchmark and methods generalize to other types of copyrighted material (e.g., characters, artistic styles, patented designs)?	Fully human-written
LU-500: A Logo Benchmark for Concept Unlearning	Soundness: 3: good Presentation: 3: good Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper propose a logo unlearning benchmark, LU-500 and also 5 metrics designed for quantitative evaluation of logo unlearning efficacy. Furthermore, a prompt-based unlearning method, ProLU, has been provided. 1. Well-motivated contribution. This paper fills a gap: prior benchmarks largely focus on natural images and stylistic concepts. Logos are intellectual property that diffusion model often memorize and highly relevant for both safety and legal compliance. 2. Proposed benchmark.The proposed LU-500 contains 500 logos across 10 commercial categories. 3. Four quantitative metrics are proposed for logo unlearning efficacy evaluation. 1. More strong concept unlearning baselines (e.g., [1] ) should be involved for benchmark evaluation. 2. Limited benchmark scale and diversity. Despite its value, 500 logos remain small relative to the variety of commercial marks. Many logos share similar geometric primitives, which might cause evaluation saturation. 3. Ambiguous boundary between memorization and semantic retention. Some metrics (e.g., CLIP-based LS) may not effectively differentiate between semantic similarity (“a bitten apple”) and literal logo reconstruction (“Apple Inc.” logo). A clearer delineation between concept-level leakage and pixel-level memorization would improve interpretability. [1] Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models, NeurIPS 2024 Check the above weakness section.	Lightly AI-edited
LU-500: A Logo Benchmark for Concept Unlearning	Soundness: 2: fair Presentation: 2: fair Contribution: 1: poor Rating: 2: reject Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully.	The paper proposes LU-500, a benchmark for “logo unlearning” in text-to-image models, built from 9,584 prompts across Fortune Global 500 brands with explicit (LUex-500) and implicit (LUim-500) tracks. The benchmark design is clear. LU-500 isolates logo unlearning with two realistic prompting modes and a sizable, vetted prompt set. 1. I think this work essentially belong to the “prompt engineering” , not only LU-500 built from some prompts, but also the ProLU are three prompt-based LLM agents. Unfortunately, I do not see any algorithmic innovation in this work. 2. This work heavily relies on GPT-4o to built both the benchmark and the ProLU agents, yet Appendix A claims LLMs were used only to polish writing—that’s ridiculous 3. The author claims to “propose” 5 metrics as core contribution in the introduction, but CLIPScore and SSIM are common metrics and there is nothing new. See the weakness part of my review	Fully human-written
LU-500: A Logo Benchmark for Concept Unlearning	Soundness: 3: good Presentation: 3: good Contribution: 2: fair Rating: 2: reject Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully.	The work introduces LU-500, a new benchmark designed to evaluate concept unlearning methods for company logos within text-to-image diffusion models. The dataset contains prompts derived from Fortune Global 500 companies and is divided into explicit (LUex-500) and implicit (LUim-500) tracks. The authors propose five quantitative metrics (CLIPScore, LogoScore, LogoSSIM, ImageScore, ImageSSIM) to assess both local logo removal and global image preservation across pixel and latent spaces. Experiments compare inference-time unlearning methods (NP, SLD, SEGA) and fine-tuning approaches (ESD, Forget-Me-Not) on Stable Diffusion 3 Medium. All baselines perform poorly, motivating the authors’ prompt-based baseline, ProLU, which edits prompts through a three-agent pipeline (Remover, Reflector, Checker). ProLU achieves stronger logo removal but somewhat weaker background preservation. The paper also performs a correlation analysis between unlearning effectiveness and image characteristics (area, location, fractal dimension) and finds only weak relationships. - New benchmark focusing on logo unlearning. This is a neglected but socially relevant copyright-protection task. - The work is clearly written and easy to follow. - The five proposed metrics systematically separate local logo removal from global fidelity, going beyond binary success rates. - LU-500 focuses only on Fortune 500 logos; small-brand or non-Latin logos are not covered. - Reliance on CLIP and SSIM metrics raises concerns about semantic leakage or bias: low CLIPScore may not perfectly reflect successful logo removal. Human evaluation or perceptual studies would strengthen claims of “logo removal.” - The benchmark and metrics are valuable, but ProLU mainly repurposes LLM-based prompt rewriting without clear algorithmic innovation beyond dataset design. See weaknesses above.	Heavily AI-edited

PreviousPage 1 of 1 (4 total rows)Next