ICLR 2026 - Reviews

Submissions Reviews

Reviews

EditLens Prediction: Fully AI-generated Heavily AI-edited Moderately AI-edited Lightly AI-edited Fully human-written All

Rating: 1 2 3 4 5 6 7 8 9 10 All

Confidence: 1 2 3 4 5 All

Summary Statistics

EditLens Prediction	Count	Avg Rating	Avg Confidence	Avg Length (chars)
Fully AI-generated	0 (0%)	N/A	N/A	N/A
Heavily AI-edited	1 (25%)	0.00	4.00	1383
Moderately AI-edited	1 (25%)	2.00	4.00	674
Lightly AI-edited	1 (25%)	2.00	4.00	2011
Fully human-written	1 (25%)	2.00	3.00	2372
Total	4 (100%)	1.50	3.75	1610

Title	Ratings	Review Text	EditLens Prediction
AnveshanaAI: A Multimodal Platform for Adaptive AI/ML Education Through Automated Question Generation and Interactive Assessment	Soundness: 1: poor Presentation: 2: fair Contribution: 1: poor Rating: 2: reject Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.	This work presents AnveshanaAI, an application-based learning platform for artificial intelligence. With AnveshanaAI, learners are presented with a personalized dashboard with streaks, levels, badges, and structured navigation across domains such as data science, machine learning, deep learning, transformers, generative AI, large language models, and multimodal AI, with scope to include more in the future. The authors also design gamified tracking with points and achievements to enhance engagement and learning, while switching between Playground, Challenges, Simulator, Dashboard, and Community supports exploration and collaboration. 1. The paper is easy to read and understand. 2. The related work makes it easy for readers to understand the context. 3. The authors intentionally highlight lots of key words, which is good but can also be distracting. 1. The main contribution is more around the user interface design side, rather than model/algorithm side. As such, this work may be more suitable for conferences in UI design such as ACM Conference on Intelligent User Interfaces (IUI), instead of machine learning conferences. 2. The experimental evaluation focuses more on the dataset property itself, rather than a proposed model or algorithm. Thus, this paper may be more suitable for a dataset track paper. 3. What does the results of fine-tuned Mistral 7B model mean? It is unclear what the advancement that the authors want to demonstrate, as no models or other datasets are compared. As such, the significance is unclear. 4. The current experiments can't really answer the proposed three research questions, based on the limitations of the experiments, including the lack of novel model design, no model comparison, no ablation studies, and no human-factor studies with real learners. 5. There is only one dataset from the authors for evaluation. No other public datasets are used, limiting the rigor and effectiveness of this work. 6. F1-score = 0.427 does not seem to be a good performance. Moreover, there are no baseline models nor ablation studies. 7. This work lacks insights about model contributions. Overall, this work is not ready for the formal publication. I suggest the authors to introduce novel model design and run more rigorous experiments to strengthen this work. Please check my concerns and questions in Weaknesses section.	Fully human-written
AnveshanaAI: A Multimodal Platform for Adaptive AI/ML Education Through Automated Question Generation and Interactive Assessment	Soundness: 2: fair Presentation: 2: fair Contribution: 1: poor Rating: 0: Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper presents AnveshanaAI, an educational platform for AI/ML learning featuring automated question generation via fine-tuned Mistral-7B, gamification elements, and multiple interaction modes. The authors construct a 10K+ question dataset aligned with Bloom's taxonomy and deploy a web-based system with playground, challenges, simulator, and viva modes. 1. Addresses relevant problem in AI/ML education 2. Comprehensive dataset statistical analysis (entropy, Bloom's taxonomy coverage etc.) 3. Dataset publicly available on HuggingFace 4. Multi-modal platform design with gamification elements 1. Wrong venue: Applications papers at ICLR must contribute novel ML methods or representation learning insights. This paper proposes no new ML techniques. 2. Minimal validation of core claims - no human evaluation 3. Weak experimental methodology, FT on 10K utterances 4. Insufficient dataset validation 5. Missing critical comparisons with similar platforms 1. Most critical: Where are the engagement measurements claimed in the abstract? Did any students actually use this platform? 2. Have you validated that generated questions are factually correct and pedagogically appropriate? 3. Why is BERTScore F1 so low (0.427)? How does this compare to baselines? 4. How does question quality compare to human experts or GPT-4? 5. Why should this appear at ICLR rather than e.g., AIED?	Heavily AI-edited
AnveshanaAI: A Multimodal Platform for Adaptive AI/ML Education Through Automated Question Generation and Interactive Assessment	Soundness: 2: fair Presentation: 1: poor Contribution: 1: poor Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	The authors propose AnveshanaAI, an application-based learning platform designed to support artificial intelligence education. The proposed AnveshanaAI is a platform designed to support artificial intelligence education and to demonstrate its application value. - This work is primarily application-oriented, and the authors present it more as a project or program report rather than as a rigorous academic paper. - Consequently, the writing, organization, and individual sections show limited connection to a well-defined research question or hypothesis. In its current form, it is difficult to identify clear evaluation criteria or a scientific contribution to assess. NA	Moderately AI-edited
AnveshanaAI: A Multimodal Platform for Adaptive AI/ML Education Through Automated Question Generation and Interactive Assessment	Soundness: 1: poor Presentation: 2: fair Contribution: 1: poor Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper presents an AI/ML learning platform that includes questions on various AI topics, covering different difficulty levels and Bloom’s taxonomy categories. The authors describe the dataset of questions used in the platform, along with its system architecture and user interface. They analyze the dataset’s statistics and explore the relationships between question category, difficulty, and taxonomy level to evaluate its quality. Finally, they report results from fine-tuning a language model using this dataset. - The dataset is extensive and includes a range of taxonomy types and difficulty levels. Overall, the paper reads more like a technical report describing an educational framework rather than a research paper presenting novel findings. The main issues are as follows: - It’s not clear to me how the 3 research questions proposed on the first page are addressed in this work. - The dataset generation process is not clearly described. It appears that instructors created the questions using the proposed framework, but the details of the annotation process are missing. For example, how difficulty levels were defined or determined. - The distribution of data across categories and difficulty levels is not reported. - The paper does not include any sample questions from the dataset, which makes it difficult to assess its quality or variety. - The purpose of the fine-tuning experiments is unclear. Moreover, the generated questions by the model are not evaluated for quality or correctness. Reporting improved perplexity alone is not sufficient to demonstrate that the model produces high-quality questions. - The scope of the dataset is also limited to the AI domain. In addition to the points mentioned in the weaknesses, I have the following questions: - How is user engagement measured? Are there any statistics or analyses showing how users interacted with the Q&A content on the platform? - How was the difficulty scaling and cross-mode adaptation described in Section 2.2 conducted?	Lightly AI-edited

PreviousPage 1 of 1 (4 total rows)Next