|
DRDFL: Divide-and-conquer Collaboration for Efficient Ring-topology Decentralized Federated Learning |
Soundness: 2: fair
Presentation: 2: fair
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
This paper proposes DRDFL, a divide-and-conquer framework for Ring-topology Decentralized Federated Learning. The method aims to tackle data heterogeneity and sparse communication challenges that arise in peer-to-peer FL systems without a central server. DRDFL introduces two modules: 1) a transferable Learngene module to encapsulate consensus knowledge addressing the label distribution skew. 2) a local PersonaNet module to mitigate feature distribution skew for local personalized feature modeling via Gaussian mixtures.
1) Decentralized federated learning is a timely and important research direction, as it mitigates several inherent limitations of centralized FL most notably the single point of failure and heavy reliance on a central coordinator. Addressing data heterogeneity among clients is also a key open challenge in this area, and this work takes a step toward that goal.
2) The paper is well written and is supported by good number of experiments.
1) While the paper motivates the ring topology as a means to improve communication efficiency and remove reliance on a central server, the justification for specifically adopting a ring structure remains somewhat unconvincing. For instance, the introduction mentions vehicle-to-vehicle networks as a potential application. However, in such dynamic environments, client participation is highly transient where vehicles continuously enter and leave the network. It is therefore unclear how the proposed framework would ensure stable participation long enough to complete a full and have several communication cycles around the ring. In practice, decentralized systems could still achieve low communication cost through peer-to-peer communication within dynamic local neighborhoods rather than enforcing a fixed ring structure. Clarifying why a strict ring topology is chosen over more flexible peer-to-peer connectivity, and how DRDFL handles client disconnection in such dynamic scenarios, would strengthen the motivation and applicability of the approach.
2) All reported experiments appear to assume 100% client participation in each communication round within the ring topology. It would be valuable to understand how DRDFL performs under partial client participation. For instance, with 50 total clients, one could randomly sample small subsets (e.g., 5 clients) to form temporary rings per round, with different subsets participating in subsequent iterations.
3) The paper evaluates DRDFL under moderate non-IID settings, but it remains unclear how the method would perform in extreme heterogeneity scenarios, specifically where each client possesses a distinct set of class labels (i.e., no label overlap).
4) How would DRDFL behave under a linear (path) topology, e.g., A → B → C → D, where no cycle exists? In a directed path, upstream clients (e.g., A) never receive updated Learngene or class statistics from downstream nodes, so consensus information cannot propagate back.
5) The evaluation is restricted to ResNet-18 and small datasets, with all clients participating every round. This leaves an open question, how DRDFL scales to larger models and datasets. Expand the study with larger models and datasets. For example use ViT-B/16 and TinyImageNet.
6) The proposed framework requires each client to communicate class statistics Learngene optimization. This will not have privacy implications behind sharing class distributions of clients?
7) For fairness, all baselines and DRDFL should be compared under same computation cost. The authors should report figures where the x-axis is the computation cost, and accuracy on the y-axis.
8) All experiments were performed under a homogeneous model architecture (ResNet-18) across clients, yet the comparison set omits decentralized model-averaging baselines that also use re-Weighted SoftMax (WSM) cross-entropy [1] during training mechanisms to mitigate data heterogeneity. Prior work such as DFML [2] have used such setup as a baseline. Compare DRDRL with WSM in decentralized FL setting.
[1] Legate et al. "Re-weighted softmax cross-entropy to control forgetting in federated learning". PMLR, 2023.
[2] Khalil et al. "DFML: Decentralized Federated Mutual Learning." TMLR 2024.
Please address concerns in Weaknesses above. |
Fully human-written |
|
DRDFL: Divide-and-conquer Collaboration for Efficient Ring-topology Decentralized Federated Learning |
Soundness: 4: excellent
Presentation: 3: good
Contribution: 3: good
Rating: 6: marginally above the acceptance threshold
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
In this paper, the authors present DRDFL, a method for personalized, decentralized federated learning that aims to improve personal model performance via two modules. The paper decomposes representation learning into a private PersonaNet module to fix feature skew and a shared, adversarially-trained Learngene module to fix label skew. Their latents are combined for classification, while only the Learngene module and global Gaussian means and variances are communicated between clients. The paper compares their method against various SOTA baselines in the literature and across different heterogeneity measures. In terms of the paper’s strengths, they demonstrate that their method improves on or is comparable to the accuracy of other methods in most experiments. Furthermore, their method achieves these improvements with a much lower communication overhead between clients. However, the paper would benefit from a clearer motivation and definition of the personalized FL setting, as well as the clarification of minor details in the questions section.
* Overall, the paper is well-written, thorough, and generally persuasive.
* The proposed method demonstrates superior or comparable performance to existing baselines from the literature when evaluated on both global and local test accuracy across a range of heterogeneity conditions.
* The approach achieves roughly two orders of magnitude lower communication cost in terms of transmitted parameters compared to other centralized and decentralized federated learning methods.
* The authors also provide very extensive ablation studies and supplementary analyses that answered most questions that I had about the work. These include
* Alternative communication topologies
* Computational overhead of DRDFL
* Convergence behavior
* Robustness under gradient reconstruction attacks
* Effects of removing individual loss components
* Performance with an increased number of clients
* Scalability to newly joined clients
* The first claimed contribution of the paper is the delineation of heterogeneity into feature skew and label skew. However, this distinction is already well-established in the federated learning literature (e.g., [1]) and therefore should not be presented as a novel contribution.
* The motivation for the personalized DFL objective is also insufficiently articulated. Additionally, it is unclear whether the Dirichlet-$\beta$ and shard-based partitioning schemes used in the experiments meaningfully represent realistic personalization scenarios. From an intuitive perspective, assessing model personalization with the dirichlet partition over labels would encourage client models to learn solutions heavily biased in favor of the local label skew with poor generalization quality. While this limitation may reflect broader challenges within the personalized FL subfield and extend beyond the scope of this particular work, the paper would nonetheless benefit from a more explicit discussion of these conceptual issues.
* The personalization objective should be defined more clearly and introduced earlier in the paper. This objective represents a key distinction from works such as DFedAvg, where the aim is to jointly train a single global model. While the authors implicitly reference this objective through the use of Local-T and Global-T curves, it would strengthen the manuscript to articulate it explicitly in the early sections.
* Instead of reporting “0.58 M of communication” as a standalone statistic in the abstract and contributions section, it would be more informative to contextualize this value by comparing it to state-of-the-art baselines from the start (e.g., indicating that the method transmits an order of magnitude fewer parameters). The communication cost is relative to the size of the backbone classifier, so the sole number is not very explanative on its own.
* The manuscript should specify which backbone classifier is used across all methods, as this detail is necessary for interpreting performance and communication comparisons.
* The qualitative analysis using Grad-CAM is an interesting direction. However, to substantiate the claims about what is learned by PersonaNet and Learngene, the authors should include at least a few more visual examples in the Appendix to better justify their interpretations. It is not clear whether the image presented is representative of the majority of the images.
* Why is the work limited to the ring topology? It is unclear why the method cannot be used on any DFL topology.
References
[1] J. Pei, W. Liu, J. Li, L. Wang and C. Liu, "A Review of Federated Learning Methods in Heterogeneous Scenarios," in IEEE Transactions on Consumer Electronics, vol. 70, no. 3, pp. 5983-5999, Aug. 2024. |
Fully human-written |
|
DRDFL: Divide-and-conquer Collaboration for Efficient Ring-topology Decentralized Federated Learning |
Soundness: 2: fair
Presentation: 2: fair
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
The work introduces DRDFL a framework for decentralized federated learning under ring topology communication. The method decomposes learning into two modules : i) Learngene that is trained adversarially under a uniform label distribution to capture global, class invariant knowledge helping with label skew; and ii) PersonaNet a personal network based on Gaussian mixture distributions to learn and preserve client specific features and alleviate feature distribution skew. Clients share Learngene parameters and Gaussian statistics with neighbors to enable peer to peer learning.
1. The separation of the personalization and generalization concerns by having two separate modules for the peer to peer learning setting is new.
2. The use of adversarial training with uniform priors for Learngene and Gaussian mixture modeling for PersonaNet seems to effectively bridge the personalized representation learning with federated optimization.
3. The experiments on multiple datasets show superior performance of the proposed algorithm.
1. While the idea is novel for the peer to peer learning network, similar ideas have existed in the FL setups with access to a central server.
2. The paper is somewhat hard to follow and the writing could better describe the intuition of each component and design choices, for example how the adversarial uniform constraints achieve global invariance, etc.
3. If and how these two modules interact with each other is under discussed.
1. How sensitive is the method to the hyperparameters controlling the balance between generalization and personalization losses? |
Fully human-written |
|
DRDFL: Divide-and-conquer Collaboration for Efficient Ring-topology Decentralized Federated Learning |
Soundness: 3: good
Presentation: 3: good
Contribution: 3: good
Rating: 4: marginally below the acceptance threshold
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
The paper proposes DRDFL, a novel framework for Decentralized Federated Learning (DFL) on a ring topology. The core idea is to tackle data heterogeneity by decoupling the model into two components: a personalized PersonaNet module and a consensus Learngene module. The PersonaNet is trained to capture client-specific feature distributions using a Gaussian mixture model. The Learngene, which is the only component communicated between clients, is trained to learn class-invariant, generalizable knowledge through an adversarial optimization process against a uniform label distribution. The authors claim that this "divide-and-conquer" approach achieves superior personalization and generalization performance with very low communication overhead (0.58M parameters) in a ring-topology setting.
1. The proposed architecture, which combines a VAE-like structure with a dual-branch encoder for personalization (PersonaNet) and generalization (Learngene), is novel in the context of DFL. The specific technique of using an adversarial classifier trained towards a uniform distribution to enforce class-invariance in the Learngene module is a clever and interesting contribution.
2. The paper addresses a highly relevant and challenging problem: efficient and effective DFL under sparse communication topologies (ring) and severe data heterogeneity. A method that can achieve strong performance with minimal communication cost would be a significant advancement for practical peer-to-peer applications like collaborative autonomous driving or edge IoT systems.
1. The proposed DRDFL framework is exceedingly complex. It involves a VAE-like structure with two separate encoders, a decoder, a primary classifier, and an adversarial classifier. The optimization involves minimizing a combination of at least seven different loss terms: reconstruction loss ($L_{rec}$), classification loss on original and noised data ($L_{ce}$), two losses for PersonaNet ($L_{cls}, L_{log}$), and three losses for Learngene ($L_{kl}, L_{adv}, L^u_{adv}$). This complexity raises serious questions about the method's practicality, stability, and sensitivity to hyperparameter tuning. The ablation study in Table 5 is insufficient as it only removes entire loss groups (L_PR or L_GL), failing to justify the necessity of each individual component.
2. The convergence analysis provided in Appendix A.2-A.3, while a good effort, feels generic and disconnected from the core novelties of the method. Specifically, Assumption 3, which bounds the parameter variation of the Learngene module ($||\tilde{\phi}^t - \phi^t_m||^2_2 \leq \delta^2$), is a very strong and potentially unrealistic assumption. In a highly non-IID setting, the Learngene received from a neighbor with a drastically different data distribution could be very far from the client's current version, making $\delta^2$ large and rendering the convergence bound vacuous. The analysis does not seem to capture the dynamics of the adversarial training or the VAE objective, which are central to the proposed method.
3. The paper champions the ring topology for its communication efficiency but largely ignores its critical drawbacks. The primary issue is the slow information mixing time, which is linear in the number of clients ($O(M)$). Information from one client takes $M-1$ rounds to reach its other neighbor. This could severely hamper convergence speed in large-scale networks (e.g., M > 100), a limitation not explored in the experiments (which use M=20 and M=50). More importantly, the paper dismisses the critical issue of node failure as "future work". In a ring, a single disconnected client breaks the entire communication loop, halting the training process. This is a major practical failure point that undermines the viability of the proposed approach for real-world systems.
1. The system is a complex amalgamation of multiple techniques (VAE, GMM, adversarial learning, noisy reconstruction). Could the authors provide a more detailed ablation study to justify the inclusion of each component? For instance, what is the performance impact of removing the noisy reconstruction, or using only the KL-divergence for the Learngene without the adversarial component? How sensitive is the model to the relative weighting of the numerous loss terms?
2. Regarding Assumption 3 in your convergence proof, how do you justify that the norm difference between the received and local Learngene module is bounded by a small constant $\delta^2$, especially in the early stages of training or under extreme label skew where neighboring clients might have completely disjoint class sets? |
Fully AI-generated |