ICLR 2026 - Reviews

Submissions Reviews

Reviews

EditLens Prediction: Fully AI-generated Heavily AI-edited Moderately AI-edited Lightly AI-edited Fully human-written All

Rating: 1 2 3 4 5 6 7 8 9 10 All

Confidence: 1 2 3 4 5 All

Summary Statistics

EditLens Prediction	Count	Avg Rating	Avg Confidence	Avg Length (chars)
Fully AI-generated	1 (33%)	6.00	4.00	1636
Heavily AI-edited	0 (0%)	N/A	N/A	N/A
Moderately AI-edited	0 (0%)	N/A	N/A	N/A
Lightly AI-edited	2 (67%)	5.00	3.50	3266
Fully human-written	0 (0%)	N/A	N/A	N/A
Total	3 (100%)	5.33	3.67	2722

Title	Ratings	Review Text	EditLens Prediction
Ensemble Prediction of Task Affinity for Efficient Multi-Task Learning	Soundness: 2: fair Presentation: 3: good Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked.	Proposed ETAP builds a scalable predictor by computing a gradient-alignment affinity score for pairs and groups in shared parameters, then refining it with learned nonlinear transformations and residual corrections. Across benchmarks, ETAP improves MTL gain prediction and enables more effective task grouping, outperforming used baselines. It combines gradient-based affinity with learned non-linear relationship modeling to efficiently and accurately capture task relationships. And it includes thorough component-wise ablations that clarify each contribution and improve interpretability. Dividing learning tasks into groups based on similarity is a long-standing area [1]. The paper introduces new measures of task affinity for MTL, but I am not fully convinced that the proposed methods are superior to prior work. The baselines used are relatively dated, and a comparison of computational cost and predictive performance with stronger recent baselines, such as [2], would strengthen the claims. Efficient group-wise tracking of task affinity is also not new, as [3] tracks inter-task affinity in a group-wise manner during multi-task optimization. Finally, I am not convinced that the proposed methods clearly improve over other inter-task affinity tracking approaches, including [4]. [1] Which tasks should be learned together in multi-task learning? [2] Task Grouping for Automated Multi-Task Machine Learning via Task Affinity Prediction [3] Selective Task Group Updates for Multi-Task Optimization [4] Scalable Multitask Learning Using Gradient-based Estimation of Task Affinity It would help to clarify the concrete differences from prior work, especially how your methods distinctively improve on gradient-based affinity and group-wise tracking approaches. Practical guidance on when to prefer your method over more recent baselines would make the contribution clearer.	Lightly AI-edited
Ensemble Prediction of Task Affinity for Efficient Multi-Task Learning	Soundness: 2: fair Presentation: 3: good Contribution: 2: fair Rating: 6: marginally above the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	This paper proposes ETAP (Ensemble Task Affinity Predictor), a framework for predicting multi-task learning (MTL) gains to enable efficient task grouping. The approach combines white-box gradient-based affinity scoring with data-driven ensemble prediction. 1. The two-stage ensemble design that uses gradient-based affinity scores as foundation and refining with data-driven models is reasonable, it combines white-box gradient-based affinity scoring with data-driven ensemble prediction. 2. ETAP achieves impressive runtime reduction while maintaining or improving correlation with ground-truth gains. This is a meaningful practical contribution. 1. The gradient-based affinity score (Equation 5) is quite similar to existing work (TAG), just removing the auxiliary forward/backward passes. The B-spline transformation and ridge regression are standard techniques. The main novelty seems to be in combining these pieces, which feels somewhat incremental. Can you clarify what is fundamentally new here beyond engineering different existing methods together? 2. While you claim ETAP is "scalable," all experiments use relatively small task sets (n=7-10). What happens when n=50 or n=100? The affinity score computation still requires training one MTL model with all tasks, and the pairwise scores grow as O(n²). The paper doesn't really demonstrate scalability to large task pools that appear in real applications. 3. Section 3.2.3 uses branch-and-bound from prior work. So the contribution is really just the gain prediction, not the actual grouping algorithm. This should be made more clear in the contribution claims. See weaknesses.	Fully AI-generated
Ensemble Prediction of Task Affinity for Efficient Multi-Task Learning	Soundness: 3: good Presentation: 3: good Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work.	Summary This paper addresses a central challenge in multi-task learning (MTL): identifying groups of tasks that mutually improve each other’s performance when trained together. Existing methods typically fall into two categories — white-box and black-box approaches. The paper introduces a hybrid method that integrates both. The proposed two-step framework first estimates pairwise task affinities by training a single MTL model on all tasks within a group, inferring affinities for all possible combinations. In the second step, a non-linear mapping from task affinity to performance gain is constructed and refined using residual predictions. The method is evaluated against several state-of-the-art task grouping algorithms, with ablation studies highlighting the contribution of each component. Results demonstrate the benefits of applying non-linear mappings and residual predictions. Overall, the paper presents a robust and well-conceived methodology that successfully combines the strengths of existing approaches. Strengths Tackles an important challenge in the MTL literature — the high computational overhead of existing task-grouping algorithms. Through comprehensive ablation studies, the paper provides valuable insights into the role of gradient similarities (e.g., via comparison of affine vs. non-linear mappings). Design choices (e.g., use of B-splines and regression techniques) are justified through ablation analyses and comparative experiments. The paper is clearly structured and effectively relates its contributions to prior work, ensuring coherence and contextual grounding. The experimental evaluation goes beyond measuring performance gains, also analyzing the correlation between predicted affinities and ground-truth transfer gains. Weaknesses While computational efficiency is claimed as a key advantage, it would be valuable to include results for the complete approach (including hyperparameter tuning in the second stage) or to explicitly state that the additional cost is negligible. Comparing against an additional data-driven baseline would strengthen the evaluation. The performance gains over naive MTL are relatively modest; a discussion of their practical significance would help contextualize their value. Some design choices lack clear theoretical justification (e.g., why averaging gradient similarities yields effective affinity estimates, or why B-splines are particularly suitable). The proposed method introduces several additional hyperparameters (for the B-spline expansion, regression method, and residual prediction), which may complicate tuning and reproducibility. Questions for the Authors What is the rationale for time-averaging the cosine similarity over the K K training steps in Equation (6)? How stable is this measure across training, and how much variability is typically observed? Since TAG’s affinity scores are on a different scale than observed gains, how does your method correct for this discrepancy? You mention that GRAD-TAE performs well for groups — could you include these results in Table 3 for completeness? Could you provide additional details on the computational requirements of the different methods, particularly in terms of the number of backward passes and the extent of hyperparameter tuning involved? Recommendation Although some design choices could be better motivated and the performance gains over naive MTL are modest, the paper provides valuable insights into the problem of task grouping. The authors conduct detailed and well-structured experiments, including informative ablation studies that clarify the role of different components within the proposed framework. Additional Feedback Figure 1 is somewhat difficult to interpret, as it presents too many elements at once. Consider splitting it into two complementary figures: one conceptual illustration to convey the overall idea, and another outlining the algorithmic steps in more detail. Section 4 could be improved for clarity and consistency. Ensure that all discussed results appear in the corresponding tables or figures, and avoid repetition of statements like “TAG achieves higher correlation but incurs significant computational overhead.” The term “training cost” could be replaced with “computational cost” for improved clarity. In Section 3.1.3, explicitly explain the difference with TAG and refer to the appendix where both affinity measures are compared. The discussion of limitations could be expanded. As shown in the appendix, the method performs comparably to naive MTL in some cases, indicating room for improvement and further exploration in future work.	Lightly AI-edited

PreviousPage 1 of 1 (3 total rows)Next