ICLR 2026 - Reviews

SubmissionsReviews

Reviews

Summary Statistics

EditLens Prediction Count Avg Rating Avg Confidence Avg Length (chars)
Fully AI-generated 1 (25%) 2.00 5.00 2591
Heavily AI-edited 1 (25%) 4.00 3.00 1955
Moderately AI-edited 0 (0%) N/A N/A N/A
Lightly AI-edited 2 (50%) 3.00 3.50 2262
Fully human-written 0 (0%) N/A N/A N/A
Total 4 (100%) 3.00 3.75 2268
Title Ratings Review Text EditLens Prediction
Focused Diffusion GAN: Object-Centric Image Generation Using Integrated GAN and Diffusion Frameworks Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. This manuscript investigates how to enhance the quality of object-centric image generation when training data is limited(e.g. <3k) or contains degraded images. The authors propose a hybrid GANs-Diffusion framework that integrates a discriminator into the intermediate denoising steps of the diffusion process to improve visual fidelity. An Additional Noise Perturbation Module is also introduced to steer the model's focus toward predefined bounding boxes containing key objects. The proposed method has been experimentally validated on complex scene datasets—including Cityscapes-pedestrian, Traffic-Signs, and MS-COCO(Potted Plant)—demonstrating its effectiveness in generation tasks. The research problem addressed in this manuscript—generation with limited data—is highly meaningful. The approach of integrating a GANs discriminator to enhance quality is well-justified, and the idea of leveraging bounding boxes to prioritize the generation quality of key objects is particularly suitable for complex scene generation. Experimental results demonstrate a clear improvement in generated quality compared to existing methods. The experimental analysis appears somewhat fragmented and would benefit from consolidation and restructuring. The current evaluation is incomplete, as it fails to demonstrate the method's effectiveness in downstream tasks—particularly as data augmentation. Moreover, the study lacks intuitive assessments of generation quality, such as visual comparisons of generated images. Additionally, discussions and comparisons with existing methods in the field of generation with limited data are notably absent. 1.The manuscript should discuss recent work on few-shot sample generation, which is highly relevant to the presented approach. 2.Several notation issues are present in Equations (7) and (8). For instance, the time step 't' is missing in Equation (8), and the origin of the variable x^is not defined. 3.Both the diffusion loss and the reconstruction loss pertain to reconstruction. Please clarify the distinct roles and motivations for including both terms in the objective function. 4.While the introduction claims that the method is intended for augmenting downstream detectors, no experiments are conducted to evaluate the utility of the generated samples in such downstream tasks. 5.How does the performance vary with different scales of training data (e.g., 100, 1,000 samples)? An analysis of the method's sensitivity to training set size is needed. 6.The experiments primarily follow a single-objective-per-dataset setting (e.g., pedestrians, traffic signs, potted plants). The applicability of the method to multi-object generation scenarios should be discussed, as this is critical for complex real-world applications. Lightly AI-edited
Focused Diffusion GAN: Object-Centric Image Generation Using Integrated GAN and Diffusion Frameworks Soundness: 3: good Presentation: 3: good Contribution: 3: good Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This paper proposes Focused Diffusion-GAN (FDGAN), a hybrid generative model designed for object-centric image generation in low-data regimes. The key innovation is integrating a GAN discriminator into intermediate denoising stages of a diffusion model through an Additional Noise Perturbation Module (ANPM). ANPM selectively activates adversarial training at specific timesteps and applies targeted Gaussian noise within bounding-box regions to guide the model's attention toward objects. The authors evaluate FDGAN on three small datasets: Cityscapes-Pedestrian, Traffic-Signs, and MS-COCO potted plants, demonstrating improvements in perceptual quality and reduced overfitting compared to GAN-only, diffusion-only, and hybrid baselines. - The selective integration of adversarial training at intermediate diffusion timesteps (t < t_early) is an interesting approach that differs from prior hybrid methods. - Detailed ablation studies demonstrating the effectiveness of each component (GAN/ANPM, reconstruction losses, weighting schemes). - The evaluation is restricted to only three small datasets, all at 256×256 resolution. The generalizability to other domains, higher resolutions, or multi-class scenarios remains unclear. - The main part of the method is performing GAN training on intermediate diffusion timesteps, which can be regarded as a hyper-parameter tuning. And the justification (theory / empirical investigation) is insufficient, resulting in limited novelty. - Although the BB noise is highlighted in the abstract, there is no ablation study on it. - See Weaknesses. - Why use diffusion loss instead of consistency loss in training? I think the consistency loss aligns closer with the GAN, I feel strange about the usage of diffusion loss. Lightly AI-edited
Focused Diffusion GAN: Object-Centric Image Generation Using Integrated GAN and Diffusion Frameworks Soundness: 1: poor Presentation: 1: poor Contribution: 2: fair Rating: 2: reject Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. The paper introduces FDGAN, a method that combines GAN and diffusion models for low-data, object-aware synthesis aimed at augmenting downstream object detectors such as YOLO and DETR. While the idea of integrating these models is potentially interesting, the paper fails to provide a clear big picture or logical foundation for the approach. It primarily focuses on implementation details without explaining the underlying principles, and the experimental validation does not adequately support the claims made in the introduction. The attempt to merge GAN and diffusion models for data augmentation in low-data scenarios is a relevant and timely topic. The paper presents a structured method with multiple loss functions, which could be a basis for further development. Lack of Conceptual Clarity and Big Picture - The paper does not sufficiently explain the core principles behind fusing GAN and diffusion models. For instance, it describes how the models are combined but fails to justify why this fusion is theoretically sound or beneficial. This omission makes it difficult to assess the novelty and contribution of the work. Excessive Repetition in Citations - The paper suffers from redundant citations, which reduce its readability and professionalism. For example, in the first paragraph of page 2, "Karras et al., 2020a" is cited four times. This indicates a need for better citation management to avoid clutter. Insufficient Experimental Validation - The introduction claims that FDGAN is an object-aware synthesizer for augmenting detectors like YOLO and DETR, but the experiments do not provide evidence to support this. There are no results demonstrating improved performance on downstream detection tasks, which undermines the paper's main motivation. Methodological Justification - The method section introduces three loss functions but does not explain the rationale for their selection or combination. Without a principled discussion of why these losses are chosen and how they interact, the approach appears ad hoc and lacks theoretical grounding. Can the authors provide a more detailed theoretical explanation for the fusion of GAN and diffusion models? How does the object-aware synthesis specifically benefit downstream detectors? The authors should include experiments that evaluate FDGAN's impact on detector performance (e.g., using metrics like mAP for YOLO or DETR) to validate the claims. Please justify the combination of the three loss functions: what is the principle behind each loss, and how do they collectively contribute to the model's objectives? Fully AI-generated
Focused Diffusion GAN: Object-Centric Image Generation Using Integrated GAN and Diffusion Frameworks Soundness: 3: good Presentation: 2: fair Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This paper proposes Focused Diffusion-GAN (FDGAN), a hybrid generative model that integrates a GAN discriminator into a diffusion model at intermediate denoising stages. The method introduces an Additional Noise Perturbation Module (ANPM) that selectively activates the adversarial branch when samples are sufficiently denoised and applies localized noise within bounding-box regions to guide object-centric focus. The paper targets low-data object-centric regimes, evaluating on three small datasets (Cityscapes–Pedestrian, Traffic-Signs, COCO “potted plant”). Experimental results demonstrate improved perceptual fidelity and reduced overfitting compared to diverse baselines. 1. Task focus: The focus on limited-data, object-centric scenarios is well-motivated and practical (e.g., privacy-blurred faces, small datasets). 2. Comprehensive evaluation: Benchmarks include both GANs and DMs, using DINOv2-based metrics and traditional FID/Precision/Recall. 1. Marginal FID improvements: The proposed method performs worse than Diffusion-GAN on FID across all datasets. 2. Novelty scope: The hybridization of diffusion and GANs has been explored. The core novelty lies mainly in localized noise perturbation (ANPM) and timestep scheduling, which might be seen as incremental. 3. Effectiveness evidence: Since FDGAN aims to be "a low-data, object-aware synthesizer for augmenting downstream detectors (e.g., YOLO/DETR)", including downstream detection fine-tuning results would strengthen claims. 1. How sensitive is FDGAN to the choice of the timestep threshold $t_\text{early}$ and noise strength $\gamma$ in ANPM? 2. Can the ANPM mechanism generalize to non-bounding-box settings (e.g., segmentation masks or text prompts)? 3. Sections 4.1 and 4.2 share the same table (Table 1) without explicit reference to it, and the order of the models in the table is chaotic. It is not friendly to performance comparison and analysis. Improvements are recommended. Heavily AI-edited
PreviousPage 1 of 1 (4 total rows)Next