ICLR 2026 - Reviews

Submissions Reviews

Reviews

EditLens Prediction: Fully AI-generated Heavily AI-edited Moderately AI-edited Lightly AI-edited Fully human-written All

Rating: 1 2 3 4 5 6 7 8 9 10 All

Confidence: 1 2 3 4 5 All

Summary Statistics

EditLens Prediction	Count	Avg Rating	Avg Confidence	Avg Length (chars)
Fully AI-generated	0 (0%)	N/A	N/A	N/A
Heavily AI-edited	0 (0%)	N/A	N/A	N/A
Moderately AI-edited	0 (0%)	N/A	N/A	N/A
Lightly AI-edited	0 (0%)	N/A	N/A	N/A
Fully human-written	4 (100%)	0.50	4.25	1462
Total	4 (100%)	0.50	4.25	1462

Title	Ratings	Review Text	EditLens Prediction
AsseslyAI: AI–Powered Assessment Framework for Skill-Oriented Engineering Lab Education	Soundness: 2: fair Presentation: 1: poor Contribution: 1: poor Rating: 0: Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper describes a platform for delivering personalised practical exercises for computer science students. The pair describes the overall system architecture and key features and a dataset of 10,000 LLM-generated questions that have been created. A model to generate more questions is then fine-tuned using this dataset. Some evaluations are presented to validate the questions in the synthetic dataset and those generated by the fine tuned model. The main strengths of the paper are: — The paper addresses the use of LLMs to support computer science education through the generation of personalised learning materials, which is a promising research area. — The generated set of 10,000 questions could be a useful contribution to the research community. The main weaknesses of the paper are: — Large parts of the paper are descriptions of system, platform, and interface design which have very little relevance to ICLR. — The paper needs careful review and revision as it has many typographical and language issues. — It is not clear what is actually being tested in the evaluation experiments described by the authors. Evaluations compare the marksAI and marksFaculty but it is not clear what these actually measure. — The contributions of the paper are not made explicit. What exactly do marksAI and marksFaculty measure and why are they needed in the synthetic question database? How is the following claim supported: "Unlike competitive programming datasets such as CodeNet Puri et al. (2021), our synthetic dataset is explicitly aligned with computer science lab curricula. It emphasizes implementation skills, conceptual understanding, and explainability-oriented evaluation rather than correctness alone, better reflecting the learning objectives of practical coursework."?	Fully human-written
AsseslyAI: AI–Powered Assessment Framework for Skill-Oriented Engineering Lab Education	Soundness: 2: fair Presentation: 1: poor Contribution: 1: poor Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This work focuses on building a framework for automatic assessment and feedback generation for students. Specifically, the authors built a dashboard for students and teachers to view the progress of students. They synthesized a set of 10K problems to avoid potential cheating issues as well. + The system if implemented correctly, should make meaningful contributions to the domain of intelligent tutoring systems. - The framing of the work does not make sense -- what exactly is the motivation? It's just not coherent in the motivation framing of the paper. The lack of engagement in labs may not be directly associated with cheating -- it's another issue. In addition, why would these new assessment tools help address the issues mentioned before? It's just not clear in the introduction. - Multiple validation processes or labeling processes require further details. For example, the difficulty levels are manually labeled -- it's a large set of 10K questions. How many people were involved? How about their level of expertise, and did they have a consistent guideline to label difficulty? These points are unclear. - There's no evaluation on the actual usage of the system. What is the population of students? Did they have learning gains from these systems, and how did the experiment work? These points are critical for educational AI work. - Line 46: How would this be ensured for unique questions for students? - Section 2.1: These two things are just different -- they can be separate sessions. - Section 2.2: Difficulty calibration is not introduced, and I also don't think this is needed. - Line 250: How would an XGBoost generate these? How exactly do you generate feedback? - Figure 4: It's from January to August -- do you have the system across semesters?	Fully human-written
AsseslyAI: AI–Powered Assessment Framework for Skill-Oriented Engineering Lab Education	Soundness: 1: poor Presentation: 1: poor Contribution: 1: poor Rating: 0: Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully.	The paper aims to improve lab work in computer science education by automatically creating lab problems based on teacher input and difficult levels. These question can then be assigned to each student. Towards this goal, they create 10K question-answer pairs. The question/answer pairs are evaluated based on their similarity and automated grading compared against manual grading. The strength of the paper is that is tackles an important problem: AI-supported, individual learning. The authors seem to have implemented a working prototype including web interface for their system including a student/teacher dashboard. The system also seems to feature a Speech Interface for viva questions. The paper reads more like an implementation report of an AI-Lab Management system (e.g., authentication and access control). The challenges it wants to address are very wide: plagiarism, lack of proper lab records,unstructured lab conduction,inadequate execution and assessment,lack of practical learning,limited student engagement, and absence of progress tracking. These challenges seem to have little overlap with what is then presented and evaluated later in the paper. I think this paper is not a good fit for ICLR. I suggest the authors to revise the paper, i.e., better defining the scope and designing experiments and then submit it to a more applied conference/venue, maybe in the educational science domain.	Fully human-written
AsseslyAI: AI–Powered Assessment Framework for Skill-Oriented Engineering Lab Education	Soundness: 1: poor Presentation: 1: poor Contribution: 1: poor Rating: 0: Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper claims to use a model fine-tuned on 10k (or 8k) programming lab questions to generate different questions for individual students, of varying difficulty levels, with automatic grading. 1. Using AI to generate questions for individual students may be useful. 2. Adapting question difficulty to individual students would be useful. 1. Unfortunately, this works lacks any human validation of its claim. 2. It is unclear if parts of the paper or reproducibility checklist are written by a human or revised/hallucinated by an LLM to meet submission criteria. 3. The paper includes inconsistencies about implementation details. 4. Missing data and code. 5. Missing evidence for claims made in the abstract. Were parts of the paper written or revised by an LLM? Were the figures or code used for generating the figures generated by an LLM? Was there human oversight?	Fully human-written

PreviousPage 1 of 1 (4 total rows)Next