ICLR 2026 - Reviews

Submissions Reviews

Reviews

EditLens Prediction: Fully AI-generated Heavily AI-edited Moderately AI-edited Lightly AI-edited Fully human-written All

Rating: 1 2 3 4 5 6 7 8 9 10 All

Confidence: 1 2 3 4 5 All

Summary Statistics

EditLens Prediction	Count	Avg Rating	Avg Confidence	Avg Length (chars)
Fully AI-generated	1 (33%)	2.00	4.00	2679
Heavily AI-edited	0 (0%)	N/A	N/A	N/A
Moderately AI-edited	0 (0%)	N/A	N/A	N/A
Lightly AI-edited	1 (33%)	4.00	3.00	1459
Fully human-written	1 (33%)	6.00	3.00	1336
Total	3 (100%)	4.00	3.33	1825

Title	Ratings	Review Text	EditLens Prediction
CAREFL: Context-Aware Recognition of Emotions with Federated Learning	Soundness: 2: fair Presentation: 1: poor Contribution: 1: poor Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	The paper proposes a federated learning framework for emotion recognition from images, designed to balance contextual reasoning, privacy, and computational efficiency. The system operates in two stages: (1) a large vision–language model (LLaVA 1.5) generates contextual captions for each image, and (2) a lightweight vision–language model (SMOLVLM2) is fine-tuned with Quantized Low-Rank Adaptation (QLoRA) in a federated setting. This design enables decentralized training without sharing raw data while leveraging semantic context from the larger model. Experiments on EMOTIC and CAER-S datasets show that CAREFL achieves higher mean average precision and F1-scores compared to larger centralized models such as GPT-4o and LLaVA, while reducing memory usage and model size. The paper’s contributions include: (1) proposing a novel two-phase federated framework combining large-model context generation with small-model adaptation, (2) introducing an efficient QLoRA-based fine-tuning scheme for lightweight federated training, and (3) comparative and ablation studies across datasets, client numbers, aggregation methods, and quantization settings. Despite its technical framing, the paper appears conceptually weak and executionally shallow: (1) The link between “context awareness” and federated learning is not clearly articulated. Context generation is performed offline using an existing large model, not integrated dynamically into the FL process. This makes the “context-aware” claim superficial. (2) Illustrations and explanation lack clarity. Figures 1 and 2 are schematic and omit crucial architectural or algorithmic details; the paper mostly reuses known components (YOLO, LLaVA, QLoRA) with limited methodological innovation. (3) Evaluations rely on narrow datasets (EMOTIC, CAER-S) without broader benchmarking or significant statistical analysis; performance comparisons against massive centralized models seem to be not fair and lack deeply analyzed. (4) Many claims (e.g., “context improves emotion recognition”) are intuitive but not theoretically supported or quantitatively dissected. Overall, presentation feels more like a system demonstration than a rigorous ICLR-level contribution; key insights or innovations are missing. 1. How exactly does “context awareness” influence the federated learning process? Does context affect model aggregation or only data preprocessing? Why was context generation performed offline instead of integrated dynamically during training? 2. How does the framework generalize to other tasks beyond emotion recognition? 3. How are biases or errors from LLaVA-generated captions mitigated during federated fine-tuning?	Fully AI-generated
CAREFL: Context-Aware Recognition of Emotions with Federated Learning	Soundness: 3: good Presentation: 3: good Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.	The paper proposes CAREFL, a two-phase framework for multimodal emotion recognition that (1) uses a large frozen VLM (LLaVA-1.5) offline to generate rich scene/subject contextual descriptions, and (2) federatedly fine-tunes a small, efficient SVLM (SMOLVLM2) on client devices using quantized low-rank adapters (QLoRA). Experiments on EMOTIC (multi-label) and CAER-S (7 classes) show large gains in mAP and varying gains in F1/Recall. 1. This paper proposed a light-weight training approach, which is shown to be effective at achieving promising model performance. 2. This paper conducted comprehensive evaluation which covers different aggregation algorithms (FedDyn, FedAvg, FedProx, FedAdam), LoRA ranks, quantization settings (4-bit QLoRA vs full LoRA) 3. Large performance improve on EMOTIC benchmark 1. Claims of outperforming huge baselines need more careful parity. The paper states CAREFL outperforms GPT-4o, LLaVA and other heavy models — but many of these baselines are used in zero-shot or prompting setups while CAREFL is fine-tuned (and in federated settings). 2. Lack of evaluation benchmarks. The proposed models and baselines are mostly evaluated on EMOTIC. The results of the proposed model on CAER-S are not compared with any baselines. 1. For results on EMOTIC, why mAP is so high while the recall and F1 are modest?	Fully human-written
CAREFL: Context-Aware Recognition of Emotions with Federated Learning	Soundness: 2: fair Presentation: 3: good Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.	The paper presents CAREFL, a framework designed for efficient and privacy-preserving emotion recognition. First, a large vision-language model (LLaVA 1.5) generates contextual descriptions of images to enrich semantic information. Second, a lightweight model (SMOLVLM2) is fine-tuned using QLoRA within a federated learning setup. This method allows distributed training without sharing raw data. Experiments on the EMOTIC and CAER-S datasets show that CAREFL achieves high accuracy and F1-scores while significantly reducing computational and memory requirements. 1. The two-phase design cleverly combines large VLMs for context generation with lightweight models for federated learning is reasonable. 2. Experiments show strong performance, surpassing larger centralized models like GPT-4o and LLaVA. 3. The paper is well-written and easy to read. 1. The proposed two-phase design relies on rich contextual descriptions generated offline using LLaVA 1.5. However, in real-world or real-time emotion recognition scenarios, such offline pre-generation is impractical due to latency, computational overhead, and privacy constraints. This is inconsistent with the author's claim. 2. The experimental setup overlooks realistic aspects of federated learning, such as heterogeneous client data distributions, communication latency, and device variability. 3. Could you show examples of successful and failed predictions for a discussion? Please see Weaknesses.	Lightly AI-edited

PreviousPage 1 of 1 (3 total rows)Next