ICLR 2026 - Reviews

SubmissionsReviews

Reviews

Summary Statistics

EditLens Prediction Count Avg Rating Avg Confidence Avg Length (chars)
Fully AI-generated 2 (50%) 4.00 5.00 4225
Heavily AI-edited 0 (0%) N/A N/A N/A
Moderately AI-edited 0 (0%) N/A N/A N/A
Lightly AI-edited 0 (0%) N/A N/A N/A
Fully human-written 2 (50%) 4.00 2.50 3549
Total 4 (100%) 4.00 3.75 3887
Title Ratings Review Text EditLens Prediction
IDSPACE: A Model-Guided Synthetic Identity Document Generation Framework and Dataset Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 6: marginally above the acceptance threshold Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. This paper introduces IDSPACE, a novel framework for generating high-quality, customizable synthetic identity documents, accompanied by a substantial open-source dataset. The core strength lies in its ability to produce diverse and realistic synthetic IDs (spanning various countries, ethnicities, and environmental backgrounds) while maintaining high fidelity to target domain characteristics. This is achieved through a unique model-guided Bayesian optimization approach that simultaneously ensures both visual similarity and model prediction consistency, even in few-shot scenarios, outperforming existing GAN-based methods like StyleGAN. The authors also demonstrate the utility of this data for training fraud detection models and validate the "realism" of the generated data using large language models like GPT-4. 1. The most crucial contribution is the provision of a customizable, large-scale, and open-source synthetic identity document dataset. This directly tackles the persistent challenge of data scarcity and privacy concerns in identity document fraud detection, offering a valuable resource for the research community. The framework's ability to generate diverse synthetic ID documents based on customizable parameters (e.g., different countries, ethnicities, and environmental backgrounds) is a major strength. This flexibility is essential for comprehensive benchmarking and evaluating fraud detection models across varied real-world conditions. 2. The use of model-guided Bayesian optimization, integrating both image similarity (SSIM) and model prediction consistency, is highly effective. This ensures that generated documents, while allowing for detail variations, maintain overall semantic consistency (e.g., preserving user identity) with the target domain. This sophisticated alignment mechanism is key to the data's utility. 3. The paper provides extensive empirical evidence for the effectiveness of the generated data in training fraud detection models. The comprehensive comparisons, particularly highlighting IDSPACE's superior consistency even with few samples compared to GAN-based baselines like StyleGAN, underscore the method's robustness and practical advantage. The innovative use of large models (e.g., GPT-4) to verify the "realism" or stealthiness of the generated data adds a unique and compelling layer of validation, demonstrating that the synthetic outputs are difficult for advanced AI to distinguish from real ones. 4. The demonstrated capability to produce high-consistency synthetic data from a minimal number of original samples is a critical advantage, making the framework cost-effective and applicable in privacy-sensitive, low-resource settings. 1. Diffusion Model Exploration: The paper does not explicitly discuss or compare the performance of IDSPACE against state-of-the-art diffusion models for synthetic identity document generation. Given the recent advancements and high fidelity of diffusion models in image synthesis, this omission leaves a gap in understanding IDSPACE's position relative to this powerful generative paradigm. 2. Generative Fraud Paradox and Control Mechanisms: A significant concern arises from the method's ability to generate highly realistic, new types of fake IDs (beyond traditional crop-and-move or inpainting). If fraud detection models are trained on IDSPACE's non-fraudulent synthetic data, they might inadvertently learn to recognize and potentially ignore the "generative artifacts" as legitimate, making them vulnerable to this new class of generative fraud. The paper does not address how models trained on IDSPACE data would detect such "generative" forged documents, nor does it propose control mechanisms to prevent the misuse of the framework for creating undetectable forgeries (e.g., when generating IDs with fabricated personal information). This ethical and practical dilemma warrants deeper discussion and potential solutions. 3. Dynamic Scene Generation: The current framework focuses on static image generation. However, in real-world remote identity verification, documents are often captured in dynamic video streams, involving subtle movements, varying lighting, and interactions with the environment. The absence of functionality to synthesize identity documents within such dynamic scenarios limits the full applicability of the dataset for training and evaluating models in more challenging, realistic conditions. See the part of weaknesses Fully AI-generated
IDSPACE: A Model-Guided Synthetic Identity Document Generation Framework and Dataset Soundness: 2: fair Presentation: 2: fair Contribution: 1: poor Rating: 2: reject Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. The paper proposes IDSPACE, a framework to generate synthetic ID documents for training and evaluating document-authentication systems. It separates metadata (like name, country, age, background, etc.) from control parameters (rendering/tamper effects) and then uses Bayesian Optimization (BO) to tune those parameters so that synthetic documents “behave like real ones” according to perceptual similarity (SSIM) and model-prediction consistency. They also release a big dataset with about 180k template and 180k scanned images of 10 European IDs. The paper is fairly well written and clear about its pipeline. The authors clearly put a lot of engineering work into the rendering system and dataset. Splitting “metadata” and “control parameters” is tidy, I can see how that helps in producing balanced or controlled sets (say, same ID but different background or lighting). Also, if someone genuinely can’t share real ID images due to privacy, a synthetic set could be convenient for testing. I’ll be blunt: the problem this paper solves doesn’t feel significant enough for a research venue like ICLR. It reads more like a software engineering or product demo paper. 1. **The core problem is weak.** The whole paper is built around “checking whether an ID image *looks* real or fake.” But modern ID verification systems don’t rely on that alone, they combine **live selfie + liveness**, **OCR + database checks**, and **cryptographic MRZ/barcode verification**. Once you have those, this “visual authenticity” check adds very little. In many cases, you can just match the text and face to the DB, and you’re done. 2. **Over-selling small tricks.** The “two key innovations” from the abstract: (1) separating metadata/control parameters, and (2) optimizing parameters with Bayesian Optimization, sound fancy, but in practice, that’s just normal pipeline design plus Optuna-style tuning. There’s no conceptual novelty or learning contribution. It’s mostly sugar coating over standard components. 3. **Feels like a tool demo, not research.** Lines like “IDSPACE enables model-guided synthesis” (Section 3) make it sound like a new ML idea, but it’s really a parameter tuning loop wrapped around a rendering engine. This could be a great engineering blog post or experiment, but not a strong scientific contribution. 5. **Synthetic is not real, and it won’t generalize.** They assume synthetic ID tampering detection can train robust models. But this kind of model tends to fail on **unseen manipulation algorithms**, exactly what attackers will do. So a model trained on their dataset could collapse the moment a new generator or Photoshop trick appears. Honestly, I’m struggling to see how this work could be improved without rethinking it from the ground up. The current motivation and problem framing (“synthetic image–based document authenticity”) feel too narrow and disconnected from how real-world ID verification is actually done. That said, here are the only questions that might change my mind if addressed clearly: 1. **Practical necessity:** Can the authors explain concrete scenarios where document-image authenticity still matters *even when* OCR → database checks, chip/MRZ verification, and selfie+liveness pipelines exist? Who exactly would use this, and why is synthetic doc-auth better than existing data-augmentation or template-rendering pipelines? 2. **Attack realism:** Most real fraud uses *real* faces and text but manipulates fields, print–scan loops, or replays. Why would a model trained on synthetic tamper patterns generalize to such attacks? Have you tested unseen manipulations (e.g., new generator, Photoshop edits, reprint attacks)? 5. **Novelty and scope:** What prevents this from being just “Optuna on top of a rendering engine”? The two main “innovations” (metadata separation + BO search) don’t seem new. If you disagree, please make the distinction explicit—what’s the scientific contribution beyond engineering? Fully AI-generated
IDSPACE: A Model-Guided Synthetic Identity Document Generation Framework and Dataset Soundness: 3: good Presentation: 3: good Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 2: You are willing to defend your assessment, but it is quite likely that you did not understand the central parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. The paper presents IDSPACE, which is a system for producing realistic synthetic identity document images tailored to a specific domain using few-shot learning. The framework does document creation as a combination of controllable parameters and high-level metadata. It applies Bayesian Optimization to automatically tune these parameters so that the resulting synthetic images match the appearance and prediction patterns of real target documents. This functions as a flexible data generator so users can define desired document attributes, and the system automatically aligns low-level visual properties to the target domain. The optimization process is guided by pre-trained fraud detection models to ensure that both visual fidelity and semantic consistency are achieved. The authors claim the experiments show that IDSPACE produces synthetic data with higher realism and cross-model consistency than baselines. The authors also released a synthetic ID dataset. 1. The paper tackles a real problem regarding public ID datasets. The main contribution IDSPACE provides is creating a tunable method that can flexibly generate IDs based on user inputs, which would be very useful for controlled and targeted analysis of existing fraud-detection models' capabilities. Furthermore, IDSPACE releases a synthetic dataset that can be utilized for fraud detection tasks. 2. Allowing users to specify entity metadata and capture conditions is a practical feature. It makes the synthetic data more relevant and interpretable than other generation model outputs. 3. Since all identities and content are synthetic, privacy is preserved. This is crucial for identity-based datasets. Furthermore, the authors explicitly note that no real personal information is used. 1. I am doubtful about basing the performance of the technique based on the model consistency evaluation only. Here the creation is optimized using an objective encompassing SSIM and model consistency, and this is also evaluated with model consistency as well. While I appreciate the fact that this evaluation is shown using different models as guidance across different models for consistency (basically showing model invariance of the proposed method), this type of evaluation isn't robust enough for fraud-detection systems that should depend on finegrained nuances. I would suggest to conduct a human study regarding correctness across the different baselines. 2. Overall the experiments and baselines compared against is limited. For example in Table 2, the methods that are utilized as baselines are CycleGAN and BO with SSIM only objective. However, generative models such as diffusion-based approaches aren't used. Furthermore, methods such as IDNet/DocXPand that utilize similar approaches should also be considered. It is also unclear how much the BO tuning actually helps. Furtheremore, there should be experiments comparing different optimization strategies to show the validity of using BO tuning for this specific setting. 3. The paper's novelty is in decoupling some metadata to be user-controlled, and including model prediction consistency in BO (BO is already done in IDNet, but that has less user control and only SSIM). Furthermore, they show that their method is better using model prediction consistency (but this is the same metric they optimized for in BO). Overall, the novelty of the contribution is incremental, but there is immense utility to the contributions as this is a user-controlled dataset generation framework which can be used to generate specific target scenarios. However, the paper didn't show evaluation or specific use-cases in this regard. 1. Could the authors clarify how they ensure that model-prediction consistency, which is used both as an optimization objective and an evaluation metric, does not bias the results toward self-validation? Have the authors considered any independent metrics (e.g., human perceptual evaluation, FID/LPIPS) to assess the correctness of the generation process? 2. Is there any specific reason only CycleGan and BO (with SSIM) are used for baselines? Can the authors show results for diffusion models or generative models (preferably trained on similar data)? 3. Is there other optimization techniques the authors utilized for experimentation? Can the authors show some baselines regarding that? 4. Can the authors demonstrate a specific user-controlled generation scenarios that highlights the practical utility of your decoupled metadata and automatic tuning? Some experiments showcasing this utility would also be great. Fully human-written
IDSPACE: A Model-Guided Synthetic Identity Document Generation Framework and Dataset Soundness: 2: fair Presentation: 3: good Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This paper introduces IDSPACE, a framework for generating synthetic ID documents from a small number of real samples. The authors tune the generation parameters (such as noise, font, and blur) for generating synthetic documents using a Bayesian Optimization method to determine how the generation parameters would need to be adjusted till the guide model produces the same prediction for real and synthetic documents, and utilize these parameters to generate synthetic data. The authors demonstrate that models trained on the IDSpace dataset generalize better than existing methods. The paper uses Bayesian Optimization (BO) for targeted update of synthetic data generation parameters based on the performance of the guide model. The generation parameters are controllable and tuned to adapt to cases when guide models fail, and the experiments show that the data generated in this method helps the detector models generalize better. The weaknesses of the guide models can potentially give interesting insight into which set of parameters is difficult to predict for which type of model. The data generation method is limited by the expressiveness of the control parameter space (fonts, blur, noise, etc). IDNet[1] already utilized Bayesian Optimization to tune parameters to generate synthetic ID documents using SSIM, and the primary contribution of this work is adding model-consistency to the objective, which makes it an important but incremental improvement. [1] L. Xie et al., "IDNet: A Novel Identity Document Dataset via Few-Shot and Quality-Driven Synthetic Data Generation," 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA, 2024, pp. 2244-2253, doi: 10.1109/BigData62323.2024.10825017. 1. In the case when the target document contains features that cannot be represented within the space (such as a unique printing artifact), how would the BO behave? Would it try to find a “shortcut” by introducing non-realistic artifacts? 2. Do the changes in the generation parameters by BO indicate any specific pattern? For example, do they change any specific generation parameter more frequently, or are there any particular changes specific to a single model? 3. Training the guide model is a stochastic process that can be affected by the model seed. How robust are the generation parameters to these variations of guide models? For example, if the same model were to be trained on different seeds, would the resultant generation parameters from them converge, or would they vary significantly? Fully human-written
PreviousPage 1 of 1 (4 total rows)Next