ICLR 2026 - Reviews

SubmissionsReviews

Reviews

Summary Statistics

EditLens Prediction Count Avg Rating Avg Confidence Avg Length (chars)
Fully AI-generated 3 (75%) 5.33 4.33 2917
Heavily AI-edited 0 (0%) N/A N/A N/A
Moderately AI-edited 0 (0%) N/A N/A N/A
Lightly AI-edited 0 (0%) N/A N/A N/A
Fully human-written 1 (25%) 4.00 5.00 4431
Total 4 (100%) 5.00 4.50 3295
Title Ratings Review Text EditLens Prediction
3DLAND: 3D Lesion Abdominal anomaly Localization Dataset Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 2: reject Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. The paper introduces 3DLAND, a large-scale dataset of 6,000+ abdominal CT scans with over 20,000 organ-aware 3D lesion annotations across seven organs (liver, kidneys, pancreas, etc.). The authors propose a three-phase pipeline leveraging MedSAM1/2 and spatial reasoning to generate volumetric lesion masks from the DeepLesion dataset, claiming expert validation with Surface Dice > 0.75. The goal is to enable robust benchmarking for anomaly localization in multi-organ abdominal CT. 1. Multi-organ, lesion-level 3D annotation is a genuine unmet need. Existing datasets (e.g., LiTS, KiTS) are organ-specific; DeepLesion lacks 3D masks. 3DLAND’s attempt to bridge this gap is highly relevant. 2. The integration of prompt-based segmentation (MedSAM) with organ-aware spatial reasoning offers a pragmatic approach to semi-automated annotation, potentially reducing expert burden. 3. The paper reports Dice, Surface Dice, and ablation studies on prompt design and assignment thresholds, suggesting comprehensive evaluation. 1. Several cited works appear non-existent or AI-generated, including but not limited to: - Ke Yan, Xiaohuan Wang, Mahmood Bagheri, Le Lu, and Ronald M. Summers. LesaNet: Robust lesion attribute segmentation in CT scans. In Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 622–630. Springer, 2019. - Xiyue Zhao, Fangyu Tang, Xin Wang, Yu Song, Wenjia Zhang, and Jian Xiao. Abdomenatlas-8k: A hierarchical 3d abdominal multi-organ segmentation benchmark, 2023a. - Zhi Zhao, Ziyu Wang, Yifan Li, Ziheng Liu, Yan Wang, Alan L. Yuille, et al. WORD: A large-scale dataset for whole-body organ segmentation in CT images, 2023b. - Yuyin Zhou, Lequan Xie, Dinggang Shen, and Lei Xing. MULAN: Multiscale universal lesion analysis network for CT scans. IEEE Transactions on Medical Imaging, 40(4):1099–1112, 2021b. - Yuyin Zhou, Lequan Xie, Dinggang Shen, and Lei Xing. MVP-Net: Multi-view feature pyramid network for universal lesion detection. In Medical Image Computing and Computer Assisted Intervention (MICCAI), pp. 37–47. Springer, 2021c. 2. Annotations are derived from DeepLesion bounding boxes, not original radiologist-drawn 3D masks. The "expert validation" is limited to 10% re-annotation, raising concerns about error propagation from DeepLesion’s 2D boxes. 3. The pipeline is essentially a post-processing wrapper around existing models (MONAI, MedSAM). No new algorithmic contribution is made, only an application of off-the-shelf tools. Please carefully address the aforementioned weaknesses. Fully AI-generated
3DLAND: 3D Lesion Abdominal anomaly Localization Dataset Soundness: 4: excellent Presentation: 4: excellent Contribution: 4: excellent Rating: 8: accept, good paper Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This paper addresses the significant challenge of limited large-scale, multi-organ, 3D lesion annotations in abdominal CT imaging by introducing ​​3DLAND​​, a comprehensive benchmark dataset. The core contribution is a novel, streamlined three-phase pipeline for generating organ-aware 3D lesion segmentation masks from the 2D bounding boxes provided in the DeepLesion dataset. The methodology involves: ​​Phase I​​: Automated lesion-to-organ assignment using spatial reasoning (IoU and Euclidean distance) with MONAI-based organ segmentation. ​​Phase II​​: Precise 2D lesion segmentation via a prompt-optimized MedSAM1 model, where a key innovation is the use of a shrunk bounding box (70%) and a center point to enhance accuracy (achieving a Dice score of 0.807). ​​Phase III​​: Volumetric 3D mask propagation using MedSAM2's memory-guided mechanism, resulting in clinically reliable 3D annotations (Surface Dice > 0.75). The resulting dataset spans over 6,000 CT volumes and 20,000+ lesions across seven abdominal organs. The work establishes a new benchmark for evaluating 3D segmentation models and enables advancements in anomaly detection and cross-organ analysis. - Originality and Pipeline Design:​​ The end-to-end pipeline is a major strength. The combination of automated spatial reasoning for organ assignment (Phase I) with carefully optimized prompt-based segmentation (Phase II) and advanced 3D propagation (Phase III) represents a significant methodological innovation. - ​​Rigor and Scale:​​ The empirical validation is exceptional. The paper demonstrates high performance (e.g., Dice ~0.81 in 2D, ~0.70 in 3D) on a very large and diverse dataset (20,000+ lesions, 7 organs), which strongly supports the reliability and generalizability of the approach. The extensive ablation studies (e.g., Figure 4, Figure 5) provide deep insights into the design choices. - ​​Clarity and Reproducibility:​​ The methodology is described in sufficient detail, and the use of established models (MONAI, MedSAM) enhances reproducibility. Figures like Figure 2 and Figure 6 effectively illustrate the process and outcomes. - ​​Generalization to External Data:​​ The pipeline is developed and validated primarily on the DeepLesion dataset. Testing its performance on external cohorts from different institutions or with different CT scanning protocols would strengthen the claim of robustness and generalizability. - ​​Granularity of Lesion Characterization:​​ While the dataset covers various lesion types (cysts, tumors), it currently lacks finer-grained annotations (e.g., benign vs. malignant classification, specific pathological subtypes). Adding such metadata in the future, as mentioned in Section 5, would greatly increase its clinical utility for tasks like risk stratification. - ​​Computational Efficiency Analysis:​​ The computational cost of the pipeline (noted as 0.05 GPU hours/volume for Phase III) is mentioned but not compared against other potential 3D segmentation baselines (e.g., nnU-Net). A brief efficiency-accuracy trade-off analysis would be informative for users with limited resources. N/A Fully AI-generated
3DLAND: 3D Lesion Abdominal anomaly Localization Dataset Soundness: 3: good Presentation: 3: good Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. This paper presents 3DLAND, the first large-scale, organ-aware 3D lesion dataset for contrast-enhanced abdominal CT scans. It addresses the absence of datasets that jointly provide multi-organ coverage, volumetric lesion masks, and explicit lesion-to-organ associations, which are crucial for clinical anomaly localization. The authors developed a three-phase automated pipeline combining spatial reasoning for organ assignment, prompt-optimized 2D segmentation with MedSAM1, and memory-guided 3D propagation using MedSAM2, validated by expert radiologists. The dataset contains over 6,000 CT volumes and 20,000 lesions across seven organs, achieving 2D Dice = 0.807 and 3D Dice ≈ 0.75. Released under CC BY 4.0, 3DLAND establishes a new benchmark for organ-aware 3D segmentation and cross-organ representation learning 1. This work proposes the first large-scale, organ-aware 3D lesion dataset (3DLAND) linking over 20,000 lesions to seven abdominal organs, filling the gap left by datasets like DeepLesion and ULS23 and enabling clinically interpretable cross-organ benchmarking. 2. This work introduces a three-phase automated pipeline combining spatial reasoning, optimized 2D prompts, and memory-guided 3D propagation, offering an efficient and accurate method to transform 2D lesion boxes into expert-level 3D masks. 3. This work provides strong experimental validation with large-scale testing, expert review, and detailed ablations, establishing 3DLAND as a reproducible and clinically reliable benchmark for future organ-aware segmentation research. 1. Lack of external validation across imaging domains. All data originate from DeepLesion, limiting generalizability across scanners and contrast phases. No cross-dataset tests (e.g., AMOS22, AbdomenCT-1K) are provided to verify whether fixed IoU and distance thresholds remain stable under varied acquisition conditions. 2. Lack of methodological novelty beyond existing SAM frameworks. The pipeline mainly combines MONAI, MedSAM1, and MedSAM2 without substantive algorithmic innovation. Prior work such as 3DSAM-Adapter and Slide-SAM already demonstrated volumetric adaptation and memory propagation in comparable ways. 3. Lack of demonstrated clinical feasibility. Although expert validation is reported, there is no reader study or workflow analysis to show how 3DLAND improves diagnostic efficiency or inter-reader consistency over 2D annotations like DeepLesion Ref: Mswal: 3d multi-class segmentation of whole abdominal lesions dataset 1. Could the authors clarify how robust the IoU > 10% and distance < 20 px thresholds are when applied to CT scans with different resolutions or contrast phases? A cross-center or multi-protocol validation would help determine whether these spatial heuristics remain valid outside DeepLesion. 2. How are ambiguous or overlapping lesions handled when they straddle two organs (e.g., hepatic hilum or pancreatic head)? The current one-to-one lesion–organ assignment might oversimplify such cases; a multi-label or probabilistic linkage could better reflect anatomical uncertainty. Fully AI-generated
3DLAND: 3D Lesion Abdominal anomaly Localization Dataset Soundness: 2: fair Presentation: 3: good Contribution: 3: good Rating: 4: marginally below the acceptance threshold Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. The paper publishes a new dataset focused on segmenting different organ-wise lesions in a joint framework. They increase the overall scale of 3D lesion datasets to 6000 using a semi-automated process, which leverages organ-segmentation -> lesion-to-organ assignment -> 2D bounding-box to 2D segmentation transfer -> 2D segmentation to 3D volume translation. In the paper they validate their data-annotation/cleaning process on a subset of their data, which they simultaneously optimize their automated process on. The paper provides a large dataset of 6000 3D volumes with lesion and organ segmentations. This has the potential to yield a reliable baseline the 3D medical image computing domain can optimize their automated methods on, as currently one is required to train multiple methods on multiple segmentation benchmarks, with different methods often having different patient splits or using noisy dataset [1]. Hence I find the concept of the dataset very interesting. [1]: Isensee, Fabian, et al. "nnu-net revisited: A call for rigorous validation in 3d medical image segmentation." International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2024. My main concern of this paper are the methods used for automated segmentation generation and the final segmentation mask quality, as previous datasets like AbdomenAtlas who also followed a sem-supervised segmentation procedure had substantial quality issues: ### Organ segmentation While I am not familiar with the MONAI framework used for organ segmentation, the recent nnU-Net Revisited [1] and Touchstone Benchmark [2] both showed that methods trained in MONAI were less powerful than those using the nnU-Net framework. Moreover, I don't see why the authors would not just use the established TotalSegmentator [3] or an ensemble/majority vote of multiple frameworks to maximize organ segmentation accuracy. Additionally, I would like the authors to provide not just the mean organ segmentation dice on L215 but the DSC per organ. This is especially important as e.g. the Liver usually has DSC >97% while smaller organs like the gallbladder are substantially worse. Just reporting the mean is not transparent enough. ### Prompt to Segmentation (2D and 3D) The authors use interactive segmentation methods to generate 2D segmentations from the 2D bounding boxes and then go to 3D. However, their used promptable methods seem to somewhat miss a few important promptable methods, like ScribblePrompt [4] or nnInteractive [5]. In particular, nnInteractive should be included, as they report very high performance and can use 2D prompts (2D bounding-boxes) to yield 3D segmentations. This removes 1 additional step, which may introduce additional errors. While this may seem nitpicky, I believe the success of this Benchmark dataset solely hinges on the lesion segmentation quality. Hence the authors should try to make this as reliable as possible and should exhaust all resources available to them. Moreover, making this an addendum afterward will just reduce the impact of the paper as people won't use it when trying out an unfinished version. (This is also largely due to not being able to inspect some samples from the dataset) Minor: The authors used the wrong citation for SAM2 in L298. __References__: [1]: Isensee, Fabian, et al. "nnu-net revisited: A call for rigorous validation in 3d medical image segmentation." International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2024. [2] Bassi, Pedro RAS, et al. "Touchstone benchmark: Are we on the right way for evaluating ai algorithms for medical segmentation?." Advances in Neural Information Processing Systems 37 (2024): 15184-15201. [3]: Wasserthal, Jakob, et al. "TotalSegmentator: robust segmentation of 104 anatomic structures in CT images." Radiology: Artificial Intelligence 5.5 (2023): e230024. [4]: Wong, Hallee E., et al. "Scribbleprompt: fast and flexible interactive segmentation for any biomedical image." European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024. [5]: Isensee, Fabian, et al. "nninteractive: Redefining 3d promptable segmentation." arXiv preprint arXiv:2503.08373 (2025). Q1: What is the per-organ segmentation performance (L241)? Q2: Why does 10% of the dataset represent 300 cases in (L323)? Shouldn't this be 600? Fully human-written
PreviousPage 1 of 1 (4 total rows)Next