ICLR 2026 - Reviews

SubmissionsReviews

Reviews

Summary Statistics

EditLens Prediction Count Avg Rating Avg Confidence Avg Length (chars)
Fully AI-generated 1 (25%) 6.00 3.00 3317
Heavily AI-edited 0 (0%) N/A N/A N/A
Moderately AI-edited 1 (25%) 2.00 4.00 3587
Lightly AI-edited 2 (50%) 5.00 3.50 2470
Fully human-written 0 (0%) N/A N/A N/A
Total 4 (100%) 4.50 3.50 2961
Title Ratings Review Text EditLens Prediction
The Other Side of the Coin: Unveiling the Downsides of Model Aggregation in Federated Learning from a Layer-peeled Perspective Soundness: 2: fair Presentation: 2: fair Contribution: 1: poor Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. This paper investigates why model averaging in federated learning (FL) often causes a temporary drop in clients’ local performance after aggregation. The authors identify that existing work treats this post-aggregation degradation as an inherent cost without explaining its internal mechanisms. To address this, they propose a layer-peeled analysis framework that examines how aggregation alters feature representations and their alignment with subsequent layers, introducing the concept of Cumulative Feature Degradation (CFD), a depth-accumulating degradation of feature quality and feature-parameter alignment. Through empirical analysis, they show that aggregation increases within-class variance, decreases between-class variance, and disrupts alignment between penultimate features and classifiers, while also improving out-of-distribution generalization. 1. The figure illustrating the layer-wise performance trend is well-presented and effectively supports the analysis. 2. The experimental setup is described with sufficient clarity and detail to ensure reproducibility. 1. Limited novelty compared to prior layer-wise/feature-alignment analyses. Prior work already diagnoses aggregation-induced feature/layer misalignment and layer-dependent behavior, and studies when layer-wise averaging or alignment helps (e.g., Fed2 [1] aligns features across clients; pFedLA [2] learns layer-wise aggregation analysis in personalized FL setting; FedFA provides detailed analysis of latent feature statistics and provide a feature alignment method; Layer-wise Linear Mode Connectivity [4] shows layers often admit linear connectivity with thorough analysis of the layer-wise parameter dynamics in model aggregation). I don’t clearly see what new insight this paper adds beyond those feature alignment analyses. 2. CFD seems like a correlate, not a fundamental driver. The experiments mainly establish correlations between CFD metrics and accuracy without causal interventions; the “cumulative” phrasing is also puzzling because aggregation is a weight-averaging step (no depth-wise propagation), and the depth trend likely reflects local training signals rather than averaging per se. 3. Explanation is generic and widely known. The argument reduces to “within-class variance rises and between-class separation falls after averaging,” which mirrors established neural-collapse/feature-separation results [5, 6] in standard deep nets; please clarify what is federation-specific beyond these generic patterns or provide theory linking heterogeneity of federated learning setting. [1] Yu, Fuxun, et al. "Fed2: Feature-aligned federated learning." Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. 2021. [2] Ma, Xiaosong, et al. "Layer-wised model aggregation for personalized federated learning." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. [3] Zhou, Tianfei, and Ender Konukoglu. "FedFA: Federated Feature Augmentation." The Eleventh International Conference on Learning Representations. [4] Adilova, Linara, et al. "Layer-wise linear mode connectivity." The Twelfth International Conference on Learning Representations. [5] Papyan, Vardan, X. Y. Han, and David L. Donoho. "Prevalence of neural collapse during the terminal phase of deep learning training." Proceedings of the National Academy of Sciences 117.40 (2020): 24652-24663. [6] Parker, Liam, et al. "Neural collapse in the intermediate hidden layers of classification neural networks." arXiv preprint arXiv:2308.02760 (2023). See weakness Moderately AI-edited
The Other Side of the Coin: Unveiling the Downsides of Model Aggregation in Federated Learning from a Layer-peeled Perspective Soundness: 2: fair Presentation: 3: good Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This paper presents a layer-peeled analysis framework to investigate performance degradation after model aggregation in federated learning (FL). The study shows that feature variance and feature-parameter alignment deteriorate as network depth increases, a phenomenon the paper refers to as cumulative feature degradation (CFD). The paper further demonstrates that model aggregation can improve feature generalization across clients. Finally, it analyzes how existing FL strategies mitigate the effects of CFD to achieve improved performance. - Overall, the paper is well-written, and the figures and explanations are clear and easy to follow. - The paper provides a systematic set of metrics for analyzing the dynamics of features and model parameters in FL settings. - The analysis mainly focuses on the proposed analytical metrics without presenting accompanying accuracy trends to support the findings. While the paper suggests that model aggregation may degrade performance, it does not clearly demonstrate how the performance drops would correlate with the reported feature and parameter metrics and their dynamics. - At Lines 273-279, the paper briefly introduces and defines CFD as the larger relative changes in the metrics as network depths increase. However, this definition is somewhat vague and lacks direct evidence that the degradation is indeed cumulative, since the observed relative changes may be influenced by various factors. For example, if performance is already low at earlier layers, the relative change may appear smaller simply because there is limited room for further degradation. - The paper introduces CFD and uses it to analyze feature and parameter dynamics in FL, as well as to interpret the behavior of existing FL approaches. However, it is not entirely clear what concrete insights CFD offers that would meaningfully guide the design of future FL methods or lead to further advances in the field. Besides the weakness shown in the above section, please also see the following questions: Q1: In Figures 2 and 3, the relative changes in feature variance increase with network depth. However, the feature variances in the shallow layers are already not performing well (e.g., large within-class variance and small between-class variance for “L1” in Figures 2(a) and 2(c)). In this case, can we still conclude that feature degradation becomes more severe in deeper layers? The smaller relative change observed in shallow layers may simply be due to their initially poor performance, rather than indicating less degradation. Q2: In Figure 8, it seems that the personalization method FedBN still exhibits larger relative changes in the deeper layers than FedAvg. Does this imply that FedBN is less effective in mitigating CFD? Additionally, what is the accuracy comparison between FedAvg and FedBN? Q3: At Lines 14-16 in the abstract, the paper states that performance drops after aggregation can potentially slow down the convergence of FL. However, in Section 4.4, the results indicate that model aggregation improves generalization. Why would improved model generalization hinder convergence? Does this imply that without aggregation, the model would converge more quickly but to an overfitted local minimum? If so, would slower convergence in this case actually be preferable? Lightly AI-edited
The Other Side of the Coin: Unveiling the Downsides of Model Aggregation in Federated Learning from a Layer-peeled Perspective Soundness: 2: fair Presentation: 3: good Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. This paper investigates model aggregation in FL from a layer-wise feature perspective by proposing a layer-peeled analysis framework for a more interpretable lens to understand the internal dynamics of FL. The analysis reveals a key phenomenon termed Cumulative Feature Degradation (CFD), and the study further examines how different FL settings influence this degradation during model aggregation. - The topic is relevant and important. The construction of a layer-peeled feature analysis framework is helpful. - The findings of CFD help explain why the performance drop is so pronounced and why it is a fundamental challenge in aggregating deep models. - The analysis covers multiple datasets and model architectures to support the findings. - The current analysis is primarily empirical, relying on experimental metrics. The paper would be strengthened by incorporating theoretical analysis to support or generalize the empirical findings. - While the abstract and introduction highlight aggregation frequency as a key factor, the corresponding analysis is put into appendix. Including it in the main text along with a more detailed discussion would be better. - Although the paper successfully diagnoses a key issue (CFD), it does not propose concrete solutions or algorithmic adjustments inspired by the insights. The claim in the abstract that the work “potentially paves the way” for better FL algorithms would be more convincing if accompanied by specific, testable hypotheses or design principles. - The figures and captions in the supplementary section can be further elaborated. Please see the weakness part. Lightly AI-edited
The Other Side of the Coin: Unveiling the Downsides of Model Aggregation in Federated Learning from a Layer-peeled Perspective Soundness: 3: good Presentation: 3: good Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This paper investigates the temporary performance drop seen in Federated Learning (FL) after models are aggregated, using a novel "layer-peeled" analysis framework to understand the root causes. The authors identify a phenomenon called Cumulative Feature Degradation (CFD), where aggregation progressively degrades feature quality and disrupts the alignment between features and parameters as network depth increases. This degradation, especially the mismatch between the final features and the classifier, is pinpointed as the main cause of the performance drop. Despite this downside, the study confirms that aggregation is vital for improving model generalization and preventing overfitting to local client data. The paper also uses this framework to explain why common FL solutions work, showing that methods like parameter personalization, pre-trained initialization, and classifier fine-tuning are effective because they successfully mitigate the CFD effect 1. The authors rigorously demonstrate that the negative impact of aggregation is not a uniform hit but a compounding problem that progressively accumulates with network depth. 2. The study offers a balanced perspective. It not only identifies the downsides of aggregation (CFD) but also validates its crucial upside, showing that aggregation is what enables the model to create more generalizable features and mitigate local overfitting. 3. The paper introduces a "layer-peeled" analysis framework that moves beyond standard accuracy or loss metrics. 1. The experimental setup involves a very small number of clients (e.g., 4 clients for PACS, 6 for DomainNet). This is not representative of typical cross-device FL scenarios, which can involve hundreds, thousands, or even millions of clients. The dynamics of averaging four or six models may be very different from averaging thousands, and it remains an open question whether the severity and behavior of CFD would scale, diminish, or change entirely in a massively federated setting. 2. The paper's conclusions about "model aggregation" are almost exclusively based on analyzing the FedAvg algorithm, which uses simple parameter-wise averaging. While FedAvg is a foundational baseline, the paper does not investigate whether the Cumulative Feature Degradation (CFD) phenomenon persists in more advanced FL algorithms designed specifically to combat aggregation problems (like FedProx, SCAFFOLD, or FedDyn). It's possible that CFD is a specific artifact of the naive FedAvg approach rather than an unavoidable downside of all model aggregation in FL 3. All experiments are conducted on image classification datasets (Digit-Five, PACS, and DomainNet) using standard vision architectures (CNNs and ViT) . The findings, while significant for computer vision, cannot be assumed to generalize to other major applications of FL. It is unknown if CFD manifests similarly in fundamentally different tasks, such as Regression problem, classification on text datasets etc. 1. Can the author suggest that this analysis still holds for strong FL algorithms like SCAFFOLD, FedDyn, pFedMe, etc.? 2. Does the authors have any theoretical justification to explain why the simple averaging of model parameters fundamentally leads to this progressive, layer-by-layer degradation in feature quality and alignment? Fully AI-generated
PreviousPage 1 of 1 (4 total rows)Next