ICLR 2026 - Reviews

SubmissionsReviews

Reviews

Summary Statistics

EditLens Prediction Count Avg Rating Avg Confidence Avg Length (chars)
Fully AI-generated 2 (67%) 5.00 4.00 3168
Heavily AI-edited 0 (0%) N/A N/A N/A
Moderately AI-edited 0 (0%) N/A N/A N/A
Lightly AI-edited 0 (0%) N/A N/A N/A
Fully human-written 1 (33%) 4.00 4.00 2670
Total 3 (100%) 4.67 4.00 3002
Title Ratings Review Text EditLens Prediction
Augmenting Research Ideation with Data: An Empirical Investigation in Social Science Soundness: 3: good Presentation: 3: good Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. This paper explores the problem of low feasibility and effectiveness in research ideas generated by Large Language Models (LLMs). The authors propose a data-augmented ideation framework to improve idea quality, which introduces (1) metadata-guided idea generation—providing dataset descriptions to LLMs to guide feasible idea generation, and (2) automatic preliminary validation—allowing LLMs to conduct empirical checks on hypotheses using available data. Experiments are conducted in the domain of climate negotiations. The authors construct a dataset collection called CLIMATEDATABANK, perform automatic and human evaluations on feasibility, novelty, and effectiveness, and further run a user study with researchers to assess inspiration effects. Results show improvements compared to non-augmented baselines. 1. The paper is clearly written, logically organized, and easy to follow. 2. Applying LLM-based ideation to the social science domain is an interesting and relatively unexplored area. 3. The inclusion of a human study is commendable, as it goes beyond evaluating the LLM’s direct outputs and provides preliminary empirical evidence of the framework’s practical utility. 4. The creation of the CLIMATEDATABANK is a useful resource for future research in this specific social science domain. 1. The proposed data-augmented ideation is a relatively straightforward extension of existing frameworks. Adding dataset descriptions as metadata and performing simple validation are natural incremental steps, not a fundamentally new approach or theoretical contribution. 2. The automatic validation process is largely descriptive and based on keyword counts or correlations, not rigorous statistical or causal analysis. It is unclear how reliable or generalizable these validations are, and they do not convincingly demonstrate an improvement in true hypothesis verification. 3. All experiments are conducted in one social science topic (climate negotiation). This makes it difficult to generalize claims about research ideation in general. 4. The paper reveals an unaddressed trade-off: incorporating metadata enhances feasibility but reduces novelty, suggesting that the framework systematically biases idea generation toward safer, data-driven concepts at the expense of creativity—a limitation that remains insufficiently analyzed or discussed. 1. The paper's main intervention seems to push LLMs towards "safer" ideas that are directly verifiable with the provided data, resulting in a drop in novelty. How do you see this framework mitigating the risk of simply generating obvious or incremental ideas that are "low-hanging fruit" in the data, rather than genuinely novel research directions? 2. The evaluation is conducted exclusively within the domain of climate negotiations. How generalizable is the proposed framework to other scientific disciplines, where data structures, hypothesis formulation, and validation standards differ substantially? Have the authors considered cross-domain experiments to support broader applicability? Fully AI-generated
Augmenting Research Ideation with Data: An Empirical Investigation in Social Science Soundness: 4: excellent Presentation: 4: excellent Contribution: 4: excellent Rating: 6: marginally above the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. This paper proposes a data-augmented framework to enhance Large Language Model (LLM) research ideation, specifically addressing the lack of feasibility and effectiveness in purely literature-driven ideas. The framework introduces dataset metadata during the idea generation stage to guide feasibility, and an automated preliminary validation step during idea selection to confirm empirical plausibility. Experiments in the social science domain (climate negotiations) show that metadata significantly improves feasibility (by 20%) and expected effectiveness, while automated validation enhances the overall quality of selected ideas. Furthermore, a human study demonstrates that these augmented LLM-generated ideas successfully inspire human researchers to propose higher-quality research. 1. The paper provides a novel, two-pronged framework that effectively integrates empirical data signals (metadata and preliminary validation) directly into the LLM ideation pipeline, which is a significant step beyond existing literature-based approaches. 2. The results are robust, supported by both automatic evaluations using multiple LLM judges and a controlled human expert evaluation that confirms substantial gains in feasibility (20%) and expected effectiveness (18%). 3. The human study successfully validates the utility of the LLM-generated ideas, showing that they are not just quantitatively superior but also serve as a useful source of inspiration, leading researchers to propose superior ideas in practice. 1. The investigation is strictly confined to one niche domain (quantitative social science on climate negotiations) using a custom-built dataset (CLIMATEDATABANK). The framework’s transferability to other scientific disciplines or fields with less structured data remains unproven. 2. The data-aware generation process appears to impose an implicit constraint on creativity, resulting in a reported decline in the novelty metric for some experimental settings. This suggests a need to better manage the balance between empirical tractability and theoretical originality. 3. The automated validation step depends on the LLM’s ability to generate and execute correct code for hypothesis testing. The robustness and reliability of this LLM-as-statistician module are not deeply quantified (e.g., code error rate, statistical inference quality), posing a potential point of fragility. 1. How would the data-augmented framework be adapted for and evaluated in research domains that primarily rely on qualitative data (e.g., interview transcripts, ethnographic notes) rather than the structured textual/panel/cross-sectional data currently in the CLIMATEDATABANK? 2. To mitigate the noted drop in novelty, could the authors introduce a tunable parameter in the prompt (e.g., a "creativity vs. practicality" score) to explicitly control the LLM's adherence to the provided metadata, allowing researchers to explore a wider spectrum of ideas? 3. Given the innovative but complex nature of LLM-driven automatic validation, what is the observed error rate (e.g., code execution failure, incorrect statistical conclusion) of the validation step, and what safeguards (e.g., code environment constraints, second-pass LLM review) are in place to ensure the validation signals are dependable? Fully AI-generated
Augmenting Research Ideation with Data: An Empirical Investigation in Social Science Soundness: 3: good Presentation: 3: good Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. This work proposes to augment the conventional LLM idea generation pipeline with two additional steps: (1) grounding the idea generation on specific dataset metadata; and (2) automated feasibility check for idea selection. The experiments are done on the social science domain. For the metadata conditioning, they first constructed a dataset called ClimateDataBank consisting of 22 datasets in CSV format, 8 reference papers, and manually curated research topics. Automatic and human evaluations show that adding metadata generally improves the feasibility of the generated ideas (especially from Table 3) where the difference in human preference is statistically significant. Next, the authors incorporate an automated validation step, where an LLM selects the appropriate dataset and runs code in a sandbox environment to validate the generated hypothesis. Automatic execution aligns with the ground truth conclusions 70% of the time, and humans generally find the automatic execution trajectories useful (Table 4b). Furthermore, ideas selected by the automatic valiation process are prferred by human experts across all metrics (Table 6). - The authors did quite extensive human expert evaluation for all experiments in the paper, which makes the conclusions more convincing. - The empirical results are positive, and human evaluators find the generated ideas helpful and inspiring. - The proposed ideas (metadata conditioning and automatic validation) are extremely simple and easy to implement. - My biggest concern is that all experiments are done on the social science domain (10 climate negotiation-related research topics in Appendix A), and I'm not sure whether the conclusions could be generalizable across other domains? For example, how would this work for empirical AI research? What would the automatic valiation look like in that case? My biased view is that the automatic validation is only possible for research problems where the execution is quite simple and straightforward, and you are gonna see a lot more errors when you move to a more execution-heavy domain, for example, LLM post-training/pre-training research, where the code implementation is more involved, and the execution involves GPUs as well. - Looks like the metadata construction is purely manual right now, how is this scalable? - My overall judgement is that it's great to see the proposed pipeline works for the social science topics being tested, but I'm not quite convinced it's gonna work the same for other broader domains. I'm thus leaning borderline reject unless the authors can convince me during rebuttal that they made this work in another domain too. N/A Fully human-written
PreviousPage 1 of 1 (3 total rows)Next