0% found this document useful (0 votes)
39 views62 pages

Robust Federated Learning Review

Uploaded by

alosaemisaud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views62 pages

Robust Federated Learning Review

Uploaded by

alosaemisaud
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

A Systematic Literature Review of Robust Federated

Learning: Issues, Solutions, and Future Research Directions


MD PALASH UDDIN, School of Information Technology, Deakin University, Burwood, Australia
YONG XIANG, School of Information Technology, Deakin University, Burwood, Australia
MAHMUDUL HASAN, School of Information Technology, Deakin University, Burwood, Australia
JUN BAI, School of Information Technology, Deakin University, Burwood, Australia
YAO ZHAO, Deakin University, Burwood, Australia
LONGXIANG GAO, Key Laboratory of Computing Power Network and Information Security, Ministry
of Education, Shandong Computer Science Center, Shandong Academy of Sciences, Qilu University of Tech-
nology, Jinan, China and Shandong Provincial Key Laboratory of Computing Power Internet and Service
Computing, Shandong Fundamental Research Center for Computer Science, Jinan, China

Federated Learning (FL) has emerged as a promising paradigm for training machine learning models across
distributed devices while preserving their data privacy. However, the robustness of FL models against adver-
sarial data and model attacks, noisy updates, and label-flipped data issues remain a critical concern. In this
article, we present a systematic literature review using the PRISMA framework to comprehensively analyze
existing research on robust FL. Through a rigorous selection process using six key databases (ACM Digital
Library, IEEE Xplore, ScienceDirect, Springer, Web of Science, and Scopus), we identify and categorize 244
studies into eight themes of ensuring robustness in FL: objective regularization, optimizer modification, dif-
ferential privacy employment, additional dataset requirement and decentralization orchestration, manifold,
client selection, new aggregation algorithms, and aggregation hyperparameter tuning. We synthesize the find-
ings from these themes, highlighting the various approaches and their potential gaps proposed to enhance
the robustness of FL models. Furthermore, we discuss future research directions, focusing on the potential of
hybrid approaches, ensemble techniques, and adaptive mechanisms for addressing the challenges associated
with robust FL. This review not only provides a comprehensive overview of the state-of-the-art in robust FL
but also serves as a roadmap for researchers and practitioners seeking to advance the field and develop more
robust and resilient FL systems.

CCS Concepts: • Security and privacy → Software and application security; Systems security; Social
aspects of security and privacy;

This work was partly supported by the Australian Research Council under grant DP220100983.
Authors’ Contact Information: Md Palash Uddin, School of Information Technology, Deakin University, Burwood, Victo-
ria, Australia; e-mail: [Link]@[Link]; Yong Xiang (Corresponding author), School of Information Technology,
Deakin University, Burwood, Victoria, Australia; e-mail: [Link]@[Link]; Mahmudul Hasan, School of Infor-
mation Technology, Deakin University, Burwood, Victoria, Australia; email: s223230033@[Link]; Jun Bai, School of
Information Technology, Deakin University, Burwood, Victoria, Australia; e-mail: baijun@[Link]; Yao Zhao, Deakin
University, Burwood, Victoria, Australia; e-mail: cnr@[Link]; Longxiang Gao, Key Laboratory of Computing Power
Network and Information Security, Ministry of Education, Shandong Computer Science Center, Shandong Academy of Sci-
ences, Qilu University of Technology, Jinan, Shandong, China and Shandong Provincial Key Laboratory of Computing
Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science, Jinan, Shandong,
China; e-mail: gaolx@[Link].

This work is licensed under a Creative Commons Attribution 4.0 International License.
© 2025 Copyright held by the owner/author(s).
ACM 0360-0300/2025/05-ART245
[Link]

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:2 Md P. Uddin et al.

Additional Key Words and Phrases: Federated learning, robustness, attack, countermeasure, defence
ACM Reference Format:
Md Palash Uddin, Yong Xiang, Mahmudul Hasan, Jun Bai, Yao Zhao, and Longxiang Gao. 2025. A Systematic
Literature Review of Robust Federated Learning: Issues, Solutions, and Future Research Directions. ACM
Comput. Surv. 57, 10, Article 245 (May 2025), 62 pages. [Link]

1 Introduction
With the advent of the intelligent era, an increasing number of diverse industries, including smart
health [121], smart transportation [197], smart manufacturing [98], and smart education [177],
have been capitalizing on the potent advantages of Artificial Intelligence (AI) to facilitate in-
dustry upgrading. Concurrently, as the digital revolution advances, human society has entered
the realm of Industrial Revolution 4.0 (IR 4.0). This era is characterized by augmented inter-
connections between individuals and the physical world through a proliferating human-machine
interface interaction [263]. At the crux of propelling high-quality development within IR 4.0, the
progress and breakthroughs in intelligent technologies assume a pivotal role in augmenting the
overall intelligence quotient of human society [3, 12]. Consequently, Machine Learning (ML), as
the primary technique of AI, has garnered significant attention from both academia and industry,
fostering an unprecedented flourishing of related learning technologies [9, 194].
ML aims to enable computer systems to automatically learn and improve through experien-
tial data, blending computer science and statistical principles. Over the past decades, advance-
ments in ML have led to four main paradigms: unsupervised, semi-supervised, supervised, and
reinforcement learning [86]. Early ML methods primarily relied on unsupervised techniques, like
clustering and anomaly detection, and supervised techniques, such as linear/logistic regression,
decision trees, and random forests, for tasks such as house price prediction and digit recogni-
tion [21]. However, these methods often required manual feature engineering and task-specific
objective functions, complicating development [35]. To overcome these challenges, deep learn-
ing (DL) emerged as a powerful supervised learning framework with enhanced learning capacity
and simplified model design [86]. DL utilizes layered network architectures to learn abstract repre-
sentations, particularly useful for complex data [37]. With advancements in computing hardware,
such as GPUs, DL has achieved significant success in tasks such as face and speech recognition,
autonomous driving, and virtual reality [164, 215]. Despite its strengths, DL’s reliance on large
labeled datasets remains a key limitation, exemplified by the 45 terabytes of data used to train
GPT-3 [250], making data scarcity a critical obstacle for further progress.
The potential remedy to address the scarcity of data for DL is found in the rapid proliferation of
interconnected Internet of Things (IoT) devices, smartphones, and a diverse array of websites.
Each individual end device has the capacity to function as a distinct data source, contributing a
substantial volume of data on a daily basis, thereby leading to the emergence of “big data” in a dis-
tributed form [217]. This phenomenon has attracted the attention of ML researchers, prompting
the development of distributed learning methodologies. Within the traditional distributed learning
framework, one or multiple computing nodes, referred to as servers, aggregate the data generated
by devices within the distributed network and subsequently undertake centralized training and
inference of ML algorithms [28]. The primary advantage of this distributed learning paradigm
lies in its potential to access a significant amount of data distributed across numerous end de-
vices. However, this approach is not without its limitations, encompassing the burden of heavy
storage and computational workloads imposed on servers, elevated communication costs entailed
in transferring data between the server and devices, and, most notably, the inherent risk of data
leakage [182].

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:3

Central Server
Step 0: Global model initialization Step 1: Client model aggregation Step 2: Client selection
all devices selected devices

updated global model


client models

Aggregation

Step 3: Global model broadcast


Step5: Client model upload

Client 1 Client 2 Client k

ĂĂ

Step 4: Client model training Step 4: Client model training Step 4: Client model training

Fig. 1. Overview of the standard FL workflow predominantly comprising five key steps. Step 0 serves as the
initial phase, occurring at the inception of FL. The subsequent processes from Step 1 to Step 5 entail iterative
repetitions until the global model achieves a satisfactory convergence state.

The escalating frequency of data breaches in recent years has sparked significant concerns
among users and government regulatory bodies [31]. Consequently, a series of laws and regu-
lations, such as the European Union’s General Data Protection Regulation (GDPR) [75] and
the United States’ California Consumer Privacy Act (CCPA) [13], have been enacted to safe-
guard users’ privacy and protect sensitive data. However, while these legislative measures aim to
enhance data security, they have inadvertently created challenges in the collection of data from
smart devices or particular institutions. This situation has resulted in the emergence of a so-called
“data island” phenomenon, where private data stored in decentralized end devices becomes difficult
to access for AI researchers, thereby impeding intelligent development across various industries
[136]. As such, there arises an urgent necessity for the implementation of innovative distributed
and decentralized learning technologies to effectively address the prevalent “data hungry” predica-
ment faced within the AI research and application community [1].
To reconcile the contradiction between “data island” and “data hungry” challenges, a natural
approach is to leverage the computational capabilities of edge devices for conducting on-device
model training locally. This strategy obviates the need to transmit privacy-sensitive data between
local devices and a central server, consequently reducing the risk of privacy infringement. There-
fore, to enable access to data generated by private edge devices within the IoT, without compro-
mising user privacy, a novel communication-efficient and privacy-preserving decentralized ML
framework cited as Federated Learning (FL) was introduced in Reference [161]. In FL, a multi-
tude of devices within the federated network collectively participate in the learning process of a
globally shared deep model, coordinated by a trusted server. The objective is to ensure that the
learned global model benefits each client device. The fundamental workflow of FL, as depicted
in Figure 1, unfolds as follows: Initially, the server initializes a global model and broadcasts it to
randomly selected available client devices. Upon receiving the global model, each client utilizes it
to initialize its local model and performs iterative learning updates on its own private data. Subse-
quently, the updated local models are transmitted back to the server. Finally, the server employs an
aggregation method to fuse all client models into a new global model. These iterative joint learn-
ing steps are repeated across multiple communication rounds until the final global model attains
satisfactory performance.
FL has emerged as a promising research direction in the field of distributed DL, distinguishing
itself from conventional distributed learning through two key features. First, FL prioritizes users’
privacy protection by restricting access to client raw data, in contrast to conventional distributed
learning, which primarily focuses on accelerating model training. Second, FL shifts the entirety of

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:4 Md P. Uddin et al.

model training to individual end devices, whereas distributed learning heavily relies on the com-
putational and storage resources of a central server. Leveraging the distribution of model training
across diverse edge devices offers several advantages, including efficient reduction of communi-
cation costs, enhanced data privacy and security, optimal utilization of network device resources,
and the facilitation of inter/cross-domain knowledge sharing. Owing to these attributes, FL has
garnered favor among privacy-sensitive industries such as healthcare, social media, and trans-
portation, where data is dispersed across geographically isolated devices [73, 76, 127, 195, 258]. Its
potential applications encompass a wide range of domains, such as human activity recognition
[10, 29], disease prediction [109, 180], sentiment analysis [191], and so on. Beyond applications
within the same domain, researchers have explored the utilization of a novel integrated technique
known as federated transfer learning, aimed at facilitating cross-domain knowledge sharing and
collaboration. This becomes particularly relevant when there is minimal overlap in either users or
user features between datasets originating from disparate domains [92, 143].
While FL presents a remarkable distributed learning paradigm that upholds data privacy, it
encounters preliminary issues arising from its distinctive operational approach. First, FL’s pri-
mary objective is to train a global model that caters to each federated participant. However, unlike
conventional distributed learning, which enjoys comprehensive access to data distribution across
clients, FL faces the limitation of incomplete perception of such global data distribution. This char-
acteristic introduces heightened optimization issues for FL algorithms. Of particular significance
is the statistical heterogeneity of federated data distribution, stemming from the heterogeneity of
participating devices with varying storage and computation capabilities, as well as diverse user op-
eration habits. Such statistical heterogeneity undermines the preliminary assumption of indepen-
dently and identically distributed (IID) data, presenting a hurdle for traditional ML optimization.
Second, although raw data transfer is unnecessary in the federated network, multiple communi-
cation rounds are still indispensable to exchange model gradients between clients and the server.
These rounds ensure the synchronization of global model gradient updates and the achievement of
global model convergence. Consequently, elevated communication costs are a significant obstacle
to the practical application of FL. As the aforementioned issues are the preliminary challenges in
the FL scheme, numerous works have been done on their alleviation from the emergence of FL
(see the following surveys: [87, 93, 133, 158, 171, 185, 192, 245, 275, 276, 282, 316, 317] for detailed
insights).
To address these inherent challenges in FL, researchers have turned to the development of robust
FL [56] that aims to strengthen the FL framework by enhancing its resilience against adversarial
threats and unreliable participants. Particularly, robust FL seeks to mitigate the risks posed by
malicious clients, noisy updates, and inconsistent data by employing advanced aggregation meth-
ods, anomaly detection, and robust optimization techniques [46]. By incorporating robustness into
the FL paradigm, researchers aim to create models that can maintain high accuracy and reliability,
even in hostile or unpredictable environments. This addition of robustness is critical for real-world
applications of FL, where adversarial conditions are not only possible but increasingly likely [36].
Presently, the robustness challenge in FL has become a burning issue that arises due to the de-
centralized nature of the training process, where multiple client devices collaboratively contribute
to model training keeping their data locally and the FL server lacks knowledge of the client’s train-
ing process, potential malicious attacks on client data, client models, and its training process. This
collaborative setup introduces several vulnerabilities and threats that can undermine the integrity
and performance of the globally trained model. Specifically, the challenge encompasses various fac-
tors as follows: (i) Adversarial data and model attacks: Clients can be compromised or malicious,
introducing adversarial data that is intentionally crafted to deceive the model. Similarly, malicious
clients can contribute harmful model updates to compromise the global model’s performance or

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:5

inject malicious code. (ii) Noisy updates: The training data from clients might contain noise due to
various reasons, such as measurement errors or data transmission issues. (iii) Label-flipped data:
Mislabeling of data by clients can lead to inconsistencies and confusion during the aggregation pro-
cess. This is particularly challenging when different clients have varying label distributions. This
noise can lead to unreliable model updates, affecting the convergence and overall model quality. We
contend that fortifying FL against malicious attacks constitutes a compelling research trajectory
to establish robustness for real-world applications. Consequently, this article shall delve into exist-
ing technical advancements, thereby facilitating the delineation of a comprehensive panorama of
robust FL. This, in turn, shall steer more researchers toward exploring the tenets of FL robustness.
As a seminal work in this area, Federated Averaging (FedAvg) [161] introduced the concept
of increasing local training and adopting weighted averaging for client models within the stochas-
tic gradient descent (SGD)-based federated training framework. However, FedAvg demonstrates
suboptimal performance when confronted with non-IID data, as subsequent FL studies have ex-
perimentally and theoretically demonstrated its unstable convergence or even divergence charac-
teristics under highly skewed non-IID scenarios [123, 307]. Moreover, FedAvg-based methods are
susceptible to malicious attacks, particularly when adversaries impersonate benign clients to par-
ticipate in joint training. These malicious clients can launch various attacks, such as contaminating
labels or features of raw data and tampering with client gradient updates, thereby compromising
the utility of the global model [45, 107, 209, 227, 285]. It is noteworthy that conventional attack
detection methods designed for centralized ML are inapplicable in FL settings due to the lack of
access to client data by the FL server. Consequently, the quest for achieving robust FL has emerged
as a prominent research focus within the FL community, serving as the primary concern of our
study. Addressing these challenges requires innovative approaches that go beyond conventional
ML techniques. Researchers are exploring a range of strategies, including modifying optimization
algorithms, integrating privacy-preserving mechanisms, developing robust aggregation methods,
and enhancing client selection strategies. These approaches collectively aim to enhance the robust-
ness of FL models against adversarial attacks, noisy data, and other vulnerabilities, while maintain-
ing the privacy and efficiency benefits of the decentralized learning paradigm.
To bolster the robustness of FL, several diverse strategies have been explored and attempted to
address this challenge. A considerable portion of studies [67, 117, 131, 259, 274, 315] have concen-
trated on devising secure and robust model aggregation methods to effectively counter malicious
attacks and accommodate heterogeneous data distribution. The commonly adopted element-wise
weighted averaging approach has shown limitations in accurately distinguishing client contribu-
tions and credibility, leading to inadequate fairness and security in the fusion of client models. As a
significant technical avenue, another approach involves modifying the client-side training scheme
to align the local training objective with the global objective. This alignment aims to rectify issues
related to client gradient drift or abnormal gradient updates [23, 40, 44, 66, 96, 323]. Additionally,
few research works adopt a dual-pronged approach to further fortify the robustness of the FL
optimization process [11, 110, 118, 184, 246, 313].
To the best of our knowledge, comprehensive surveys addressing the topic of robust FL are
scarce. In comparison to existing surveys and reviews on robust FL, our SLR stands out in several
ways. First, our SLR uses the Preferred Reporting Items for Systematic Reviews and Meta-
Analyses (PRISMA) framework to meticulously select and categorize 244 studies into eight
distinct themes. This exhaustive coverage ensures that we consider a wide range of approaches
and research on robust FL. Second, we provide an in-depth analysis of each theme, discussing the
various methods and techniques proposed to enhance the robustness of FL models and identifying
their potential gaps. This detailed examination allows readers to gain a comprehensive under-
standing of the state-of-the-art in this field. Third, unlike many existing surveys, we go beyond

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:6 Md P. Uddin et al.

Table 1. Comparison between Our SLR and Prior Survey Efforts Reveals Its Comprehensiveness,
Up-to-date Nature, and Incorporation of Diverse Viewpoints
No. of Search Bibliometric Thematic No. of Issues to Ham- No. of Strategies to
Study Year Type of Survey No. of Studies
Databases Analysis Analysis per Robustness Attain Robustness
[213] 2022 Unstructured Review 1 No No 8 1 2
[113] 2020 Unstructured Review 1 No Yes 40 3 3
[289] 2022 Unstructured Survey 1 No Yes 24 4 3
[153] 2020 Unstructured Survey 1 No Yes 18 6 0
[152] 2022 Unstructured Survey 1 No No 22 2 2
[186] 2021 SLR (not on robust FL) - - - - - -
Unstructured Survey
[324] 2021 - - - - - -
(not on robust FL)
[205] 2024 Unstructured Survey 1 No Yes 33 3 6
[294] 2024 Unstructured Survey 1 No Yes 31 3 3
[252] 2024 Unstructured Survey 1 No Yes 39 2 4
[283] 2024 Unstructured Survey 1 No Yes 46 3 2
[6] 2023 Unstructured Survey 1 No No 14 2 2
[94] 2023 Unstructured Survey 1 No No 50 1 -
Ours 2025 SLR 6 Yes Yes 244 8 8

summarizing existing work. We discuss future research directions, highlighting the potential of
hybrid approaches, ensemble techniques, and adaptive mechanisms. This forward-looking perspec-
tive is valuable for researchers and practitioners seeking to advance the field. Fourth, we include
a bibliometric analysis as an additional contribution, shedding light on the trends and research
landscape in robust FL. Overall, our SLR goes beyond existing surveys by offering a comprehen-
sive, up-to-date, and multi-perspective overview of the robustness challenges and solutions in FL,
as condensed in Table 1. To this end, we provide the list of contributions to our work as follows:
— Comprehensive Literature Review: While many survey papers provide a review of the
literature, our work distinguishes itself by adhering to the rigorous PRISMA framework,
which ensures systematic selection, categorization, and analysis of 246 studies, providing
exhaustive coverage (2016.01.01 and 2024.08.26) across eight distinct themes related to robust
FL. This structured approach contrasts with more generalized reviews, which often lack such
comprehensive thematic analysis.
— Bibliometric Analysis: Our inclusion of bibliometric analysis goes beyond simply cate-
gorizing studies. It provides insights into trends such as the year-wise publication trends,
geographical distribution of research, and key contributors in the field. This analysis offers
unique value by identifying emerging research hubs and collaboration patterns that are typ-
ically overlooked in conventional survey papers.
— Categorization of Research Themes: Unlike many surveys that merely summarize re-
search efforts, we categorize the selected studies into eight themes of ensuring robustness in
FL, namely, Objective Regularization, Optimizer Modification, Differential Privacy Employ-
ment, Additional Dataset Requirement and Decentralization Orchestration, Client Selection,
New Aggregation Algorithms, Aggregation Hyperparameter Tuning, and Hyperparameter
Tuning during Model Aggregation. This thematic organization provides a novel way for re-
searchers to navigate the diverse solutions proposed in the field and easily identify methods
relevant to their specific use cases.
— Synthesized Thematic Findings: The synthesis of findings within each theme, accom-
panied by potential research gaps, sets our work apart by offering a holistic view of the
strengths and limitations of existing FL techniques. This approach goes beyond a summary
of prior works, providing a roadmap for how future research can address the identified gaps.
— Future Research Directions: While it is common for survey papers to propose future re-
search directions, our review identifies concrete and underexplored areas, such as hybrid
approaches, ensemble techniques, and adaptive mechanisms. This forward-looking analysis

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:7

is grounded in a critical assessment of the literature, offering a uniquely actionable set of


future challenges and opportunities for robust FL.
The rest of the article is organized as follows: In Section 3, we describe the methodology used
for conducting the SLR. We introduce the PRISMA framework, which guides the review process,
including study selection, data extraction, and categorization. Section 4 analyzes the bibliometric
descriptive results of the selected studies in the SLR. We present the results of the SLR by catego-
rizing the selected studies into eight thematic areas in Section 5. Building upon the synthesized
thematic findings, Section 6 outlines future research directions to enhance the robustness of FL
models. The article concludes by summarizing the contributions of the SLR in Section 7.

2 Preliminaries of Robust FL
This section introduces the fundamental concepts of robust FL, covers the key challenges, adversar-
ial attacks and noisy updates, and outlines evaluation metrics to assess performance. Additionally,
real-world case studies are presented to highlight the practical applications of robust FL.

2.1 Introduction to Robust FL


Robust FL refers to an enhancement of FL that focuses on ensuring the security and stability of
models trained across decentralized clients. Robust FL aims to address the inherent vulnerabilities
in FL, particularly in environments where data privacy is critical but malicious or unreliable be-
havior by some clients can undermine the model’s integrity [36, 46, 56]. By improving the system’s
resilience against adversarial attacks, noisy updates, and corrupted labels, robust FL ensures that
the global model can maintain high performance and security even when faced with unreliable
participants. Robust FL encounters several specific challenges that threaten its robustness. Adver-
sarial Data and Model Attacks: Malicious clients may deliberately manipulate their local data or
model updates to degrade the performance of the global model. This includes injecting poisoned
data or manipulating gradients to skew the aggregation process, making it harder for the model
to converge to a reliable state [46, 295]. Noisy Updates: Clients may unintentionally send noisy
or unreliable updates due to factors such as limited computational resources, unstable network
conditions, or inaccurate local training. These noisy updates can lead to suboptimal model per-
formance if not properly handled during aggregation [27, 46]. Label-flipped Data: Some clients
may possess or deliberately introduce label-flipped data, where correct labels are intentionally or
mistakenly altered. This can significantly impact the model’s ability to generalize and result in
poor decision-making, particularly in critical applications such as healthcare or finance [80, 84].
By addressing these challenges, robust FL aims to create a more secure, reliable FL system that is
robust to both intentional adversarial activities and accidental client-side errors.

2.2 Evaluation Metrics for Robust FL


In robust FL, several quantifiable metrics are used to evaluate the system’s performance and se-
curity. Testing accuracy is measured by the federated global model’s ability to correctly predict
outcomes on a validation set. However, robustness is typically assessed by testing the federated
global model under adversarial conditions, such as injecting poisoned updates or noisy data, and
observing the degradation in performance [287]. A key metric here is robust accuracy, which com-
pares the global model’s performance when all clients are benign versus when some clients are
adversarial. The difference in performance, known as robustness training loss, helps quantify the
system’s resilience to attacks [74]. Privacy leakage is measured using techniques like differential
privacy, where the privacy budget, denoted by ϵ, defines the upper limit on how much an adversary
can infer about any client’s data [189]. Convergence rates are tracked by counting the number of

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:8 Md P. Uddin et al.

communication rounds or iterations needed for the federated model to stabilize or achieve a target
accuracy, both under normal and adversarial conditions [130]. These metrics together offer a clear
understanding of the robustness and security of robust FL systems [78, 294].

2.3 Case Studies in Robust FL


The real-world applications of robust FL demonstrate its critical role in various sectors where
privacy, security, and robustness are essential. Robust FL enables organizations to train ML mod-
els collaboratively while ensuring that individual data remains private and the model is resilient
against adversarial attacks, unreliable updates, and data corruption. Below, we explore how robust
FL is being utilized in practical settings.
In the healthcare industry, privacy and security of sensitive patient data are of utmost impor-
tance. Robust FL allows multiple healthcare institutions, such as hospitals and research centers,
to collaboratively train predictive models on patient data without exposing the data itself. This
is particularly beneficial for tasks like disease diagnosis, where models can be trained using a
wider variety of data from different hospitals, improving accuracy and generalization. At the same
time, robust FL ensures that adversarial clients—whether intentionally malicious or simply faulty—
cannot compromise the model’s integrity. This balance of data privacy and robustness makes ro-
bust FL an ideal solution for sensitive applications in healthcare [97, 172]. In autonomous vehicle
systems, large-scale ML models are crucial for decision-making in complex and dynamic environ-
ments. Robust FL provides a way for manufacturers to jointly develop these models by sharing
model updates rather than proprietary data, ensuring intellectual property protection. However,
the potential for adversarial attacks, such as model poisoning, poses a serious threat in this do-
main. A compromised global model could lead to catastrophic failures in autonomous vehicle be-
havior. Robust FL mitigates this risk by ensuring the system’s resilience to such attacks, allowing
autonomous vehicle models to operate reliably even when facing adversarial clients [247, 310].
The implications of robust FL extend beyond healthcare and autonomous vehicles. As decentral-
ized learning systems become more widespread in industries such as finance [170], Internet of
Things (IoT) networks [171], and smart cities [178], the ability to secure these systems against
adversarial actions while preserving privacy becomes crucial. Robust FL offers a path forward by
ensuring that models can be trained securely and efficiently in environments where data security,
privacy, and robustness are non-negotiable. This makes robust FL a foundational technology for fu-
ture applications that rely on secure and scalable distributed learning systems. To this end, robust
FL presents a powerful solution for addressing the security and robustness challenges inherent in
decentralized learning environments. By improving resilience against adversarial attacks, noisy
updates, and faulty clients, robust FL ensures reliable performance and data privacy across vari-
ous sectors. From healthcare and autonomous vehicles to finance and IoT networks, robust FL is
poised to become a foundational technology for secure, decentralized learning systems that pri-
oritize both privacy and robustness. Future developments in this field will continue to expand its
applicability and strengthen the security of collaborative learning environments.

3 Methodology
The methodology for our SLR involves a structured and systematic approach to identify, select,
and analyze relevant research studies on the topic of robust FL. The PRISMA framework guides
the methodology, ensuring transparency and rigor in the review process.

3.1 Search String Definition


To expand the search results, we utilize “Robust Federated Learning” as the key term and include
synonyms and acronyms as supplemental terms. For each primary source, we create search strings

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:9

Table 2. Search Key and Supplementary Terms


Key Term Supplementary Terms
Robust Federated robust federated aggregation, byzantine-robust federated learning, byzantine-robust
Learning aggregation, byzantine-resilient federated learning, byzantine-resilient aggregation,
outlier-robust federated learning, outlier-robust aggregation, outlier-resilient
federated learning, outlier-resilient aggregation, attack-adaptive aggregation,
attack-adaptive federated learning, backdoor federated learning.

that check the title, abstract, and keywords. After finishing the first draft of search strings, we check
the results of each search string against each database to determine their effectiveness. Table 2
displays the finalized search terms.

3.2 Research Question


The primary aim of this SLR is to comprehensively investigate and synthesize the existing body
of research on Robust FL. Specifically, we seek to address the following research questions: (i)
What are the current state-of-the-art mechanisms to hamper the robustness of FL? (ii) What are
the present state-of-the-art approaches in achieving robustness in FL? (iii) What are the poten-
tial research gaps of the state-of-the-art methods in attaining robustness in FL? and (iv) What
would be the open problems and promising future research directions toward obtaining a robust
FL framework?

3.3 Study Design


This SLR adheres to the PRISMA framework, a well-established guideline for conducting system-
atic reviews. The PRISMA framework ensures a transparent, comprehensive, and reproducible
process, enabling us to identify, select, and synthesize relevant studies effectively.

3.4 Study Source


To identify relevant studies for this review, a systematic search has been conducted across vari-
ous academic databases and publication repositories. Key databases included are (i) ACM Digital
Library, (ii) IEEE Xplore, (iii) ScienceDirect, (iv) Springer, (v) Web of Science (WoS), and (vi) Sco-
pus. The search timeframe is set in between 2016.01.01 and 2024.08.26. We screen and select the
papers from the initial search according to the preset inclusion and exclusion criteria elaborated
in Section 3.6.

3.5 Search Strategy


A comprehensive search strategy has been formulated using a combination of keywords and con-
trolled vocabulary relevant to Robust FL. We tailor the search strings to the specific syntax of each
database. Boolean operators (AND and/or OR) and truncation symbols are used as necessary to
ensure comprehensive coverage of relevant literature.

3.6 Eligibility Criteria


To ensure the inclusion of high-quality and pertinent studies, the following eligibility criteria have
been applied during the study screening and selection process:
— Inclusion criteria:
– Studies published in peer-reviewed journals, conference proceedings, and reputable
sources.
– Studies that focus on Robust FL, addressing methods, techniques, algorithms, and
applications.

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:10 Md P. Uddin et al.

Table 3. Search Result Quantity and Selected Studies of Different Databases

Database Result quantity Selected papers Database Result quantity Selected papers
ACM 110 37 Springer 236 27
IEEE Xplore 205 159 WoS 132 104
ScienceDirect 90 45 Scopus 221 153
Total unique selected studies: 244

– Studies that present empirical research, experiments, simulations, or case studies related
to attaining robustness in FL.
– Studies published in the English language.
— Exclusion criteria:
– Studies that focus only on hampering robustness in FL without proposing any potential
solutions to alleviate it.
– Survey, review, and SLR studies, as they do not contribute to the development of attain-
ment of robustness in FL.
– Studies that are not in English.
– Academic dissertations, tutorials, editorials, and magazines.

3.7 Study Selection


We perform the selection process in two phases: title/abstract screening and full-text screening.
Two independent researchers screened the titles and abstracts of the retrieved studies against the
eligibility criteria to identify potentially relevant articles. Discrepancies are resolved through dis-
cussion or consultation with a third researcher. Subsequently, the full texts of the selected articles
are assessed for final inclusion. As a result, we found a total of 244 unique studies, while we provide
the amounts of the initial and selected studies for each primary source in Table 3. Notice that the
sum of selected studies from these databases exceeds 244, as some of these selected studies have
appeared in multiple databases. It is worth mentioning that a significant portion of these papers
were initially found in Scopus and subsequently identified in other databases as well.

3.8 Data Extraction and Synthesis


Data extraction is performed by two independent researchers using a standardized data extraction
form. The form captures essential information from each included study, such as authors, publi-
cation year, type of robustness considered, modifying FL component for attaining the robustness,
robustness technique and approach, evaluation type, application domain, and dataset used. In ad-
dition, we find out the potential gap(s) of every included study to be considered by the research
community. All data are recorded in the Microsoft Excel sheet for analysis and synthesis processes.
The extracted data are synthesized and summarized qualitatively to address the research questions.

3.9 Reporting
The results are reported following the PRISMA guidelines. The findings are organized based on
eight (08) themes and research trends. We include a flow diagram in Figure 2 depicting the study
selection process, a summary of key findings, and an in-depth discussion of the research trends,
challenges, and opportunities in Robust FL considering the aforementioned eight themes.

3.10 Discussion and Future Directions


The findings are discussed in the context of the existing literature. Emerging trends and gaps
in the field are identified, leading to the formulation of future research directions to address the
challenges of robust FL.
ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:11

Fig. 2. RRISMA flow diagram of the study selection process.

Fig. 3. Number of yearly publications and prediction


of the future years. Fig. 4. Geographical scope of the studies.

4 Descriptive Result Analysis


4.1 Publication Trend
We utilize graphs and charts in our bibliometric analysis to illustrate the findings of scientific
endeavors documented in research databases. Figure 3 illustrates the trajectory of scholarly pub-
lications concerning the attainment of robustness in FL spanning the period from 2017 to August
2024, with a prediction for the remaining of 2024 and whole 2025, using the Linear regression
[59]. This graphical representation indicates an upward trajectory in the number of publications
addressing this subject. Before 2020, there was a notable absence of substantial research focusing
on the robustness associated with FL. This can be attributed partly to the fact that robustness
challenges in FL only emerge after the initial stages of the adoption and execution of FL. The pri-
mary surge in research regarding these challenges occurred between 2020 and 2022. By 2022, the
volume of publications concerning the robustness associated with FL had increased significantly.
We have even more papers from 2023 and a substantial number of papers from 2024 (up until Au-
gust). This surge can be attributed to the nascent stage of successful usage and execution of FL
globally.

4.2 Coverage of Researchers


Figure 4 portrays the distribution of publications across different countries concerning the attain-
ment of robustness in FL, while Table 4 provides the country names and their actual number of pub-
lications. A significant portion of research efforts in this domain has been concentrated in China.

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:12 Md P. Uddin et al.

Table 4. Country-wise Authors Count of the Selected Studies

Label Country Studies Label Country Studies Label Country Studies


1 China 138 11 Spain 3 21 Norway 2
2 USA 66 12 Qatar 3 22 Netherlands 4
3 Australia 26 13 Iran 3 23 KSA 2
4 UK 19 14 India 9 24 Italy 4
5 Singapore 8 15 Germany 5 25 Japan 3
6 Hong Kong 8 16 Austria 3 26 Romania 1
7 South Korea 9 17 Pakistan 2 27 Ireland 1
8 Canada 10 18 UAE 4 28 Bangladesh 1
9 Sweden 4 19 France 4
10 Switzerland 3 20 Luxembourg 1

Conferenc
41.8% 102

Journal
125
51.2%

17
Chapter
7.0%

Fig. 6. Commonly used datasets from 2017 to 2024.


Fig. 5. Types of the studies from 2017 to 2024.

Up until August 2024, Chinese researchers had authored 138 publications in this field, while the
United States has made relatively comparable contributions to the literature.

4.3 Type and Application of the Studies


Figure 5 illustrates the predominant types of publications, namely, conference articles (42%), jour-
nal articles (51%), and book chapters (7%), during the period from 2016 to August 2024. This data
suggests that researchers and scholars exhibit a stronger inclination toward disseminating their
work through participation in conferences and journal articles. These platforms offer the oppor-
tunity to receive targeted recommendations, insights, and feedback from reviewers and fellow
participants. However, Figure 6 provides an overview of publications categorized by different ap-
plication areas, addressing challenges related to robustness in FL from 2016 to 2024. The data
from Figure 6 underscores the widespread concern among researchers from different application
domains regarding robustness issues with FL. Computer Vision (CV), language modeling, and
face recognition received the most attention among the application areas. Conversely, application
areas such as cybersecurity, IoT, health, and bioinformatics exhibit limited research output. Con-
sequently, this discovery highlights the need for increased research focus within these application
domains.

4.4 Collaborative and Non-collaborative Nature of the Studies


Figure 7 depicts the sources (organizations) of the research conducted. It can be seen that 88.9%
of the research was independently undertaken by universities, with a smaller fraction originating
from industrial entities and academia-industry collaborations. However, Figure 8 visualizes the
network of co-authorship connections between countries in the field of robustness in FL. In this

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:13

250

217
200
Number of studies

150

100

50

13 14
0

Academic Industrial Combined

Source

Fig. 7. Source of the studies from 2017 to 2024. Fig. 8. Co-authorship network of countries.
Client-side Treatment Server-side Treatment
84
84
78

72

66

61 60

54
53

Number of Studies
48

42

36

30

24
16 18
12
10
8 12

0
Objective Optimizer Differential Privacy Additional Dataset Manifold Client Selection Aggregation Aggregation
Regularization Modification Employment Requirement and Approach Hyperparameter
Decentralization Modification Tuning
Orehestration

Thematic Foci

Fig. 9. Distribution of the studies based on dif-


Fig. 10. Distribution of the studies based on the-
ferent attacks.
matic foci.

network, each node represents a country, and the node’s size is based on the number of publica-
tions from that country. The thickness of the connections between nodes indicates the strength
of collaboration between those countries. The map depicted in Figure 8 illustrates the central col-
laborative roles of China, the United States, the United Kingdom, and Australia, as indicated by
their larger nodes. Singapore, Hong Kong, South Korea, India, and Canada show moderate lev-
els of collaboration with other countries. Conversely, Austria, Luxembourg, France, Kingdom of
Saudi Arabia, Ireland, Norway, and Bangladesh emerge as less central countries in terms of co-
authorship. Notably, the intensified connections between China and the United States suggest a
robust collaborative partnership between these two nations.

4.5 Distribution of the Studies Based on Different Attacks


A total of 244 studies encompass seven distinct types of attacks. Data and model poisoning attacks
account for 84 studies, Byzantine adversarial attacks account for 61 studies, and model poisoning
attacks comprise 53 studies. These three categories collectively constitute a substantial portion of
the studies. Additionally, 10 studies each are dedicated to data poisoning attacks, and 16 studies on
label-flipping attacks. The class of malicious clients encompasses 12 studies, leaving the remaining
8 studies categorized under targeted and untargeted attacks, visually represented in Figure 9.

4.6 Distribution of the Studies Based on the Eight Thematic Foci


A total of 244 studies have been categorized across eight themes that we consider representative of
ensuring robustness in FL. Out of these 244 studies, 72 pertain to client-side treatment, while the
remaining 172 are related to server-side treatment. Within the client-side treatment category, 8
studies focus on objective regularization, 14 on optimizer modification, 10 on the implementation
of differential privacy, 15 on additional dataset requirements and decentralization orchestration,
and the remaining 25 are centered around manifold methods. On the server-side treatment, the
majority of studies, totaling 83, revolve around aggregation hyperparameter tuning, followed by

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:14 Md P. Uddin et al.

37 studies concerning client selection, and 52 studies involving modifications to the aggregation
approach. These findings are visually represented in Figure 10.

5 Thematic Result Analysis


The encompassing review of selected studies (n = 244) investigates diverse dimensions of tech-
niques to attain robustness in FL. To systematically synthesize this extensive collection of findings,
each study underwent meticulous evaluation to unveil recurring themes. Employing content anal-
ysis, we identify eight key emergent themes of ensuring robustness in FL; five belong to the client-
side treatment and the rest (three) belong to the server-side treatment, for obtaining robustness in
FL ensuring that the thematic foci would present an unbiased and clear narrative of the selected
studies. In particular, the eight themes are (i) Objective Regularization, (ii) Optimizer Modification,
(iii) Differential Privacy Employment, (iv) Additional Dataset Requirement and Decentralization
Orchestration, (v) Manifold, (vi) Client Selection, (vii) Aggregation Approach Modification, and
(viii) Aggregation Hyperparameter Tuning.

5.1 Client-side Treatment


5.1.1 Objective Regularization. Objective regularization methods in FL aim to enhance robust-
ness by introducing additional terms or penalties into the optimization objective function. These
approaches focus on improving the global model’s performance and stability by encouraging bet-
ter cooperation among participating clients while considering the potential presence of malicious
or faulty data. In this SLR, we identified eight works focusing on objective regularization to ensure
robustness in FL using different process.
L2 and Norm-based Regularization: L2 regularization is commonly used to improve model
robustness by preventing overfitting. Probabilistic L2 norm checks are applied in RiseFL [322] to re-
sist Byzantine attacks, although these checks can be computationally costly. FedProx [120] also uti-
lizes L2 regularization, ensuring consistent model updates across clients to stabilize global model
performance. Robustness through Jacobian and Huber Regularization: Jacobian-based reg-
ularization is employed in Flmjr [66] to maintain model stability by constraining perturbations
in the model structure. Meanwhile, multi-dimensional Huber loss minimization [305] promotes
Byzantine robustness, achieving promising results, though non-IID data scenarios require further
testing. Loss Function Approximation Techniques: Loss function-based regularization can pre-
vent models from becoming overfitted to local noise. Convex approximation and loss function
smoothing [8] leverage a sampling-based convex approximation to optimize the model objective
function, especially effective under non-IID client data distributions.
Label Noise Handling through Output Regularization: Regularization methods targeting
noisy labels are crucial for managing variability in client data quality. In Reference [82], self-
distillation is used to minimize output discrepancies between original and augmented data, re-
ducing the model’s tendency to memorize noise. Local weight adjustment on noisy datasets [27]
enables edge devices to adjust local weights before aggregation, improving cooperation between
clients and central [Link] and Byzantine-resilient Loss Selection: Adaptive ap-
proaches to regularization allow the model to respond dynamically to malicious data. Adaptive reg-
ularization with loss selection in Reference [101] combines Byzantine-resilient aggregation with
loss-based adjustments, which aids intrusion detection but may face challenges with model stabil-
ity over time. A comprehensive comparative analysis of these approaches is provided in Table 5,
which highlights potential gaps in their effectiveness.
5.1.2 Optimizer Modification. Approaches that focus on modifying optimization methods, such
as SGD, Adam, and others, play a crucial role in enhancing the robustness of FL. These methods

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:15

aim to address challenges associated with noisy updates and adversarial attacks, which can hin-
der convergence and degrade model performance. This theme in our SLR comprises 14 studies,
summarized as follows:
Byzantine-resilient Gradient Descent: Several approaches enhance SGD to withstand Byzan-
tine attacks. Byzantine-resilient SGD [200] simplifies convergence theory to improve resilience,
while adaptive gradient descent [224] combines trimmed mean aggregation for robustness in text
classification. Additionally, maximum transmit power adjustments [44] in SGD enhance update
reliability, and gradient-similarity-based secure consensus [261] dynamically excludes Byzantine
gradients. Noise and Compression Reduction Techniques: Noise reduction and compression
improvements focus on maintaining gradient integrity during transmission. Gradient difference
compression [315] reduces compression noise, boosting Byzantine resilience, while ternary quan-
tization and error feedback [265] enhance the detection of malicious clients. PolyFlagSVM [165]
applies encryption for noise resilience, particularly in medical imaging. Robust Optimization
with Minimax and Distributional Methods: Robust optimizations include distributionally ro-
bust optimization [211] for noisy-label resilience in intrusion detection, albeit with added complex-
ity. Minimax optimization [198] guards against distribution shifts, enhancing general robustness
but requiring computational tradeoffs.
Resampling and Buffering Techniques: Resampling methods address variations in updates
from Byzantine clients. SAGA [184] applies a resample-then-aggregate strategy to counter Byzan-
tine variations. Buffered asynchronous gradient descent [269] utilizes gradient buffers and adaptive
reassignment, which are effective against poisoning attacks but can delay convergence. Adaptive
Optimization and Client Selection: Adaptive methods modify optimization strategies based
on current client behavior. Adaptive FL [230] adjusts to model poisoning and Byzantine threats,
though effectiveness can decline under high attack intensities. Wasserstein-distance-based client
selection [204] strengthens adversarial robustness by dynamically selecting clients, despite re-
source demands. Lightweight and Computation-optimized Methods: Optimizations that re-
duce computation focus on minimizing load for non-faulty agents. For instance, non-faulty agent
optimization [137] lowers computational demand by prioritizing essential tasks. Stochastic gradi-
ent methods [261] apply gradient selection strategies to minimize unnecessary calculations, sup-
porting scalability. These approaches are compared comprehensively in Table 6 to improve FL
robustness against noisy and adversarial attacks.

5.1.3 Differential Privacy Employment. Approaches that leverage differential privacy to en-
hance the robustness of FL focus on preserving individual privacy while aggregating model up-
dates from multiple clients. These methods address data privacy and security concerns, ensuring
that the aggregated global model remains resilient against privacy breaches and adversarial at-
tacks. By introducing controlled noise to the model updates, differential privacy techniques offer a
formal privacy guarantee while enabling effective collaborative learning. In our SLR, we identified
10 studies within this theme, which are discussed below.
Basic Differential Privacy Mechanisms for Privacy Preservation: Some approaches em-
ploy traditional DP to secure model updates. Joint DP in FL [290] limits adversarial influence by en-
suring each update minimally impacts the global model, while Local Differential Privacy (LDP)
[148] adds local noise before updates are aggregated, improving privacy protection. Shuffle and
Protocol-based Privacy Schemes: Protocols using shuffling enhance privacy by obscuring indi-
vidual contributions. In Reference [156], shuffle protocols and summation-based DP secure model
parameters during exchange and global aggregation. SIREN+ framework [65] combines DP with
proactive alarms, offering Byzantine resilience but with high communication overhead. Gauss-
ian Noise and Variance Reduction Techniques: Gaussian noise is added to model updates in

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:16 Md P. Uddin et al.

schemes like VPPFL [79], defending against model poisoning with DP-backed noise injection. Spar-
sification and variance reduction [297] merge DP with sparsity to minimize communication loads,
yet this approach may lead to high variance and processing demands in large models.
Game Theory and Adversarial Training Integrations: Integrating game theory and adver-
sarial training into DP frameworks adds robustness against targeted attacks. Adversarial training
with logit calibration [190] manages class imbalances, though it struggles with scalability. Game
theory and DP integration [290] protects against model poisoning, balancing client influence with
privacy, though practical challenges arise in implementation. DP-enhanced Gradient Descent
and Online Learning: Combining DP with gradient descent techniques provides robust learning
against Byzantine clients. Online Mirror Descent with DP [176] offers a solution for continuous
learning under DP, but it faces scalability limitations. In Reference [220], weak DP in gradient de-
scent tackles frequent, unconstrained attacks, securing updates during random sampling. To this
end, we provide a comprehensive comparative unit analysis indicating the potential gaps in these
approaches in Table 7.

5.1.4 Additional Dataset Requirement and Decentralization Orchestration. Approaches that in-
volve additional dataset requirements and decentralized orchestration aim to enhance the robust-
ness of FL by considering diverse data sources and improving coordination among participating
clients. These methods address challenges related to noisy updates and adversarial attacks, which
can hinder convergence and degrade model performance in decentralized settings. In our SLR, we
identified 15 studies within this theme.
Blockchain and Smart Contract Integration: Blockchain technology provides a decentral-
ized ledger that enhances data integrity, resilience, and security within FL. Studies such as Refer-
ence [246] present a blockchained decentralized FL framework for IoT to enhance privacy and de-
fend against poisoning attacks and server failures. However, it is limited to MNIST testing without
new aggregation methods, non-IID data considerations, or robust baseline comparisons. Addition-
ally, blockchain-based FL for privacy and data integrity [46] uses blockchain to counter poison-
ing and server failures, while FedAnil [49] combines blockchain with clustering and encryption,
despite facing non-IID data limitations. Blockchain-based consensus with smart contracts [48] in-
corporates differential privacy, although scalability remains a constraint. Shadow and Reference
Dataset Use for Model Evaluation: Shadow datasets are used as external references to evaluate
or filter out malicious models. In FedSuper [304], a Local Model Filter (LMF) applies shadow
data for poisoning detection, while shadow datasets in filtering [243] employ accuracy metrics for
model evaluation, though both approaches encounter scalability issues. Reference datasets [253]
allow nodes to assess model performance, adding robustness against adversarial updates.
Cross-validation and Auditing Mechanisms: Cross-validation techniques support model
evaluation in FL, often paired with decentralized or blockchain setups. Cross-validation against
Byzantine attacks [111] uses blockchain validation but faces computational overhead. FL-Auditor
[299] introduces third-party auditing to increase accountability, though scalability is a concern. De-
centralized Orchestration for Improved Resilience: Decentralized orchestration approaches
reduce reliance on a central server by distributing roles among clients. Ring-Allreduce with se-
cret sharing [226] enables robust, decentralized aggregation, though communication demands are
high. Byzantine resilience through role rotation [16] further enhances robustness, though it re-
quires testing across different attack types. Privacy-preserving Techniques in Decentralized
FL: Privacy-preserving methods in decentralized FL combine blockchain or other security tech-
niques with privacy measures. Hyperledger Fabric for secure predictions [89] protects FL data,
while Cre-DPSIGN with differential privacy [39] combines blockchain and DP but requires exten-
sive tuning. Reverse aggregation [103] and gradient aggregation with anomaly detection [141]

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:17

add privacy layers but can suffer from overhead and complexity. We provide a comprehensive
comparative analysis, outlining the potential gaps in these approaches in Table 8.

5.1.5 Manifold. Manifold approaches in FL address robustness by leveraging manifold prop-


erties to enhance resilience against adversarial attacks, noise, and data heterogeneity. These 25
studies ([22, 23, 40, 47, 54, 72, 81, 85, 102, 128, 151, 159, 174, 234, 255, 260, 279, 286, 288, 293, 306,
311, 313, 323]) explore diverse methods, including hyper-dimensional learning, adversarial train-
ing, model tuning, and more.
Adversarial Training and Defense Mechanisms: These studies focus on enhancing re-
silience to adversarial attacks through manifold-based adversarial training. Approaches such as
GAN-based training [151] for adversarial robustness, ARMOR framework [40, 293] in computer
vision, and ensemble learning [306] for Byzantine attack defense exemplify this category. Refer-
ence [313] uses a client update mechanism to enhance robustness by maximizing bias and variance
during updates for adversarial defense, while [128] implements adversarial training with logits cal-
ibration (FedALC) to strengthen defenses against adversarial attacks in FL. Backdoor and Model
Poisoning Defense: Strategies in this category aim to protect FL models from backdoors and
poisoning attacks. Backdoor attack mitigation [234] uses client-side model tuning, while gener-
ative adversarial approaches for edge cases [40] target specific backdoor scenarios. Additionally,
personalized training [47] and label correction networks [288] address label noise and poisoning
attacks in domains such as medical imaging. Reference [323] uses a data-sharing strategy and
attention-augmented model to obtain pre-trained model parameters and uniform data. Reference
[174] investigates the impact of data poisoning attacks on model robustness using the FedAvg ap-
proach, highlighting FL’s vulnerability to such threats. Reference [279] focuses on understanding
how data heterogeneity influences the effectiveness of backdoor attacks in FL, aiming to quan-
tify its impact on model performance. Expanding on backdoor defense, Reference [255] provides
both theoretical and experimental insights into robustness against limited-magnitude backdoor
attacks, establishing foundational guarantees for model safety. Last, Reference [260] introduces a
backdoor defense approach using graph neural networks with subgraph trigger injection in FL for
drug discovery, targeting domain-specific backdoor vulnerabilities.
Optimization and Adaptation for Robustness: Methods that focus on model tuning and dy-
namic adaptation to enhance FL robustness fall into this category. Studies like adaptive learning
rates [311] adjust based on model behavior, while autonomous rate adjustments [286] manage
learning progress. Hyper-dimensional learning [23] also speeds up training and mitigates commu-
nication costs, contributing to FL’s robustness, while Reference [72] integrates teaching and learn-
ing for robust prediction, focusing on managing sensor noise and environmental perturbations
through reliable data instances. Privacy and Secure Aggregation: Techniques in this category
aim to ensure secure, private aggregation to counter-poisoning attacks. Studies such as encrypted
verification schemes [85], SAFEFL with MPC [54], and contrastive clustering [102] for non-IID
resilience, aims to safeguard aggregation processes against privacy breaches and malicious data
manipulation. Noisy Label Management and Resilience: Several studies address the impact of
noisy or heterogeneous data labels, crucial for robust FL model performance. Self-paced learning
[81] mitigates label noise in medical imaging, while detection and exclusion of noisy federated
weights [159] manage adverse data impacts. These approaches help improve model stability and
performance in real-world FL scenarios. Decentralized Data Integrity: Some manifold methods
involve decentralized data integrity strategies and enhance robustness against central model poi-
soning. Studies include FLGuard [22], which incorporates decentralized orchestration to withstand
attacks and promote data fidelity. These manifold methods enhance FL’s resilience against adver-
sarial attacks, data diversity, and communication challenges, as summarized in Tables 9 and 10.

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:18 Md P. Uddin et al.

5.2 Server-side Treatment


5.2.1 Client Selection. Client selection in FL encompasses various innovative strategies across
83 studies to strategically choose clients, ensuring robust global model training while mitigating
the impact of noisy or malicious contributions.
Approaches Utilizing Client Features and Behavior: Studies like Reference [181] focus on
managing delays by categorizing models by origin, selecting representatives based on delays, and
weighting contributions to ensure consistent participation. Secure aggregation via model filtering,
such as the multi-Krum method with homomorphic encryption in Reference [238], and layer-wise
network parameter distance approaches in Reference [147], which emphasize standard-range dis-
tances, are employed to mitigate offline client interference. Reference [88] ensures robust partici-
pation by calculating client contributions and aggregating data label-wise, while Reference [119]
introduces automatic update clipping to improve clustering schemes based on vector norms. Filter-
ing out malicious clients through univariate outlier detection, as seen in Reference [201], is another
key method. Recent contributions include history-based anomaly detection [104], where adaptive
trust scaling mitigates on-off, bad-mouthing, and good-mouthing attacks. Furthermore, in Refer-
ence [225], cosine similarity-based detection is integrated with personalized logit adjustment to
handle non-IID data. Reference [140] addresses information leakage by utilizing population dis-
tribution for client selection, while Reference [17] employs trust bootstrapping based on a root
dataset to weigh client updates for global aggregation. Reference [69] tracks the history of gradient
updates with similarity metrics to detect unreliable clients, and Reference [4] uses clustering with
cosine similarity and meta-learning to group and select high-quality clients. Last, Reference [251]
dynamically extracts features and clusters clients based on sparse representations to exclude ma-
licious clients, particularly in medical imaging applications. Reference [308] embeds watermarks
in global models, tracking watermark recession to identify malicious clients.
Advanced Filtering and Aggregation Techniques: A variety of distance-based, clustering,
and gradient-filtering approaches are utilized to secure robust client selection. For example, Eu-
clidean distance calculations are used in Reference [67] to select benign nodes, while Reference
[259] leverages norm-based gradient filters and trimmed-mean aggregation to filter out anomalous
updates. The three-stage method in Reference [239] focuses on data complexity features to combat
poisoning, while Reference [130] introduces binary permutation for secure aggregation, creating
supernodes for enhanced synchronization. In studies such as References [145] and [202], multi-
stage approaches emphasize pre-aggregation filtering and robust aggregation using the IOWA op-
erator, respectively, to weigh model updates. Newer methods such as the dynamic gradient filter
in Reference [34] and GradScore-based detection in Reference [266] have shown high resilience to
Byzantine failures and targeted poisoning attacks. Additionally, reputation-based selection in Ref-
erence [179] and trust synthesizing mechanisms [55] use client trust scores dynamically, adapting
client selection based on performance and historical reliability. Reference [166] integrates anomaly
detection to filter out malicious updates but having potential high communication and computa-
tion overhead; scalability concerns. Reference [227] uses PCA-based reduction to filter malicious
gradients by analyzing variations in local updates, and Reference [222] employs GAN-based coun-
termeasures to detect and exclude anomalous client updates. References [300] and [257] imple-
ment mechanisms that use model-level unsupervised learning and truth discovery, respectively,
to filter poisoned or malicious updates. References [144] and [235] apply specific testing datasets
and adaptive clipping to detect and exclude malicious contributions. Reference [301] employs a
double-aggregation mechanism with weighted averaging to further refine anomaly detection dur-
ing model updates. Reference [132] employs voting mechanism for filtering abnormal directions
and scaling mechanism for adjusting abnormal magnitudes but faces difficulty in converging the
global model under high non-IID settings; potential vulnerability to adaptive attacks.

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:19

Client Selection Based on Resource Parameters: Optimal client selection based on resource
metrics, such as computation power, data volume, and network bandwidth, is discussed in studies
like Reference [70]. Trust scores and historical performance are also leveraged, as seen in Refer-
ence [295], where updates are forecasted based on previous iterations. Byzantine-robust methods,
including zero-knowledge proof filtering in Reference [157], validate updates without data expo-
sure, while cosine similarity metrics in Reference [163] focus on identifying malicious gradient
contributions. Recent advancements further incorporate adaptive and dynamic reputation-based
protocols [95] and hypermesh-based client grouping with secret sharing [154], offering refined
client selection that scales across complex and high-dimensional environments. Reference [7] is
focused on selecting clients based on resource parameters. It applies a sampling-based successive
convex approximation to overcome non-convex challenges in training, addressing resource con-
straints by managing communication and computational demands within client selection. Han-
dling Channel Uncertainty and Complex Objective Optimization: Channel reliability is vital
in FL, particularly in noisy communication channels. To address this, Reference [52] introduces a
method optimizing transmit power and beamforming to minimize MSE under channel uncertainty.
Recent methods, including the Wasserstein distance-based approach in Reference [146] and Grad-
Score for gradient monitoring [100], focus on optimizing the robustness of the aggregated model
against high channel variability. The use of PCA for dimensionality reduction [167] also addresses
channel uncertainty by clustering updates, while incentive mechanisms based on credit scores [63]
balance computational efficiency and model fidelity. Reference [42] adjusts local updates with pref-
erence values, enhancing client selection by assessing loss reduction and historical performance,
effectively optimizing for robust aggregation despite complex channel conditions.
Sophisticated Attack Mitigation and Robust Model Parameter Selection: To enhance ro-
bustness against sophisticated attacks, multiple methods employ anomaly detection and cosine
similarity filtering. For example, Reference [272] integrates adaptive filtering with cosine similar-
ity and Euclidean distance to combat Byzantine and label-flipping attacks, dynamically weighing
client updates. In Reference [312], feature center separation techniques alongside outlier detec-
tion are employed to isolate malicious updates. Studies such as Reference [271] further add secure
enclave-aided aggregation to protect updates, while Reference [169] uses saliency maps and struc-
tural similarity to detect gradient anomalies. Reference [36] leverages 3PC to securely select clients
and aggregate models, enhancing privacy under Byzantine and poisoning attacks. References [134]
and [68] use cosine similarity and element-wise sign functions, respectively, to identify malicious
clients, while Reference [242] combines SVD and clustering to filter malicious updates. Additional
methods such as References [57, 138, 173] incorporate mechanisms such as amplification, cosine
distance, and adaptive clipping for selective filtering and punishment of malicious contributions,
effectively enhancing robust parameter aggregation. Reference [296] estimates data divergence
among clients to detect backdoor clients but the defense performance may degrade under non-IID
settings and when adaptive attacks are specifically designed against the proposed method. Refer-
ence [115] uses private evaluations and clustering to detect outliers, and Reference [61] relies on
VAE-based anomaly detection to exclude malicious updates.
Emerging Techniques for Aggregation Robustness and Incentivized Participation:
Some methods are dedicated to robust aggregation with advanced filtering. For instance, robust
federated clustering with meta-learning [27] aggregates models from similar clusters to minimize
gradient divergence. Blockchain-based models, like Reference [229], introduce decentralized com-
mittees for model aggregation, and Reference [319] employs Bayesian frameworks to estimate
client utility adaptively. Incentive-based schemes, such as References [58] and [248], promote fair-
ness in client selection by rewarding trustworthy clients and mitigating label noise heterogeneity.
These strategies ensure long-term participation and maintain trustworthiness. Reference [183]

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:20 Md P. Uddin et al.

uses Shapley values to measure client contributions, incentivizing participation by selecting valu-
able clients. Reference [298] introduces parameter compression and cosine similarity-based quality
checks to select reliable clients, while Reference [249] combines gradient similarity with contract
theory for fair incentive distribution among participants. Reference [232] applies similarity-based
and spectral outlier detection in a four-pronged defense approach, and Reference [244] uses coun-
ters and authenticators to detect and block malicious clients in real-time, focusing on incentivizing
and maintaining reliable client participation.
Aggregation Using Advanced Mathematical Models and Learning Techniques: New
mathematical approaches, including Isolation Forest anomaly detection [240] and Hyperdimen-
sional Computing-based encoding [91], are instrumental in client aggregation. These methods
aim to reduce noise by identifying malicious models, while hyperdimensional encoding leverages
cosine similarity for effective aggregation. Additionally, Reference [30] introduces Gradient Recy-
cling, an approach combining reliable client selection with aggregation through gradient recycling
to reinforce model consistency. Methods like Reference [210] use Siamese networks for CNN-based
anomaly detection, while graph-theoretic correlation analysis [196] identifies backdoor attacks
by observing client update correlations. Reference [302] applies reverse engineering with global
reverse trigger creation to detect backdoor attacks through multi-step analysis. Reference [33]
utilizes Robust and Dynamic Gradient Filtering (RDGF) with distance-based scoring and geo-
metric median aggregation. Reference [264] combines clustering with cosine similarity to cluster
models and prevents poisoning attacks. Reference [88] employs blockchain-based PoIS consensus
to assess client reputation, and Reference [284] integrates cosine similarity-based aggregation with
Lagrange coded computing. Reference [14] uses a two-stage approach for aggregation, estimating
robust subspaces and filtering outliers, while Reference [25] employs an attention mechanism to
weigh gradients for attack resilience. Finally, Reference [292] implements adversarial gradient reg-
ularization to optimize model performance while evading detection by aggregation robustness
techniques. To this end, we provide a comprehensive comparative unit analysis indicating the
potential gaps of these approaches in Tables 11–16.

5.2.2 Aggregation Approach Modification. The development of new aggregation algorithms is


a novel approach to enhancing the robustness of FL by moving beyond traditional arithmetic aver-
aging. These innovative algorithms aim to improve model convergence, accuracy, and security by
aggregating client updates in a more adaptive and resilient manner. In the reviewed literature, 37
studies focus on this theme, each introducing unique aggregation techniques to address various
FL challenges.
Robust Statistical Aggregation Methods: Robust statistical techniques improve the aggrega-
tion process by limiting the influence of outliers and malicious updates. For instance, geometric-
median-based aggregation [117] and coordinate-wise median and trimmed mean [142] enhance re-
silience by isolating benign updates from noisy or faulty contributions. M-estimation aggregation
[291] combines proximal algorithms with robust statistical metrics to handle high-dimensional
data, while correntropy-based aggregation [150] applies maximum correntropy criteria, focusing
on trusted updates in non-IID environments. The Bootstrap Median-of-Means (bMOM) [256]
also provides outlier resistance through bootstrapping, aggregating updates effectively by ignor-
ing extreme deviations. Reference [274] proposes using median-based and trimmed-mean-based
gradient descent, providing resilience to Byzantine failures by mitigating the influence of extreme
updates. Reference [228] adjusts client model weights based on probability distributions, penaliz-
ing clients with abnormal values to create a robust average. References [162] and [187] both utilize
the geometric median; Reference [162] uses it to minimize weighted Euclidean distances between
model updates, while Reference [187] iteratively applies a geometric median oracle to refine

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:21

robustness. Last, Reference [231] implements a node-wise trimmed mean by discarding extreme
weight values from each local model, reducing the potential impact of outliers in aggregation.
Anomaly Detection and Filtering-based Aggregation: Anomaly detection plays a crucial
role in identifying and removing malicious updates. Techniques such as quadratic voting (QV)
[32] apply penalties to abnormal updates, while Selective Aggregation with Conditional Varia-
tional Autoencoders (CVAEs) [24] uses synthetic data validation to discard anomalous updates.
Two-stage aggregation methods like in Reference [254] combine gradient similarity with noise
injection, filtering out collusive behavior. Studies like confidence score-based scaling [41] adjust
aggregation weights based on gradient reliability, while reputation-based detection and DBSCAN
clustering in Reference [273] cluster benign clients, filtering outliers for robust model integrity.
Reference [223] introduces a truth inference framework that estimates client reliability based
on historical parameters to identify and exclude outliers. Reference [160] uses temporal diver-
gence calculations to filter malicious updates by evaluating the historical behavior of client models.
Reference [303] proposes a client sampling scheme using Euclidean distances to estimate adver-
sary likelihood, filtering out abnormal updates through coordinate-wise median. Reference [314]
combines multiple robust estimators, including NO-REGRET, FILTERING, and GAN-based algo-
rithms, which rely on bucketing and robust estimation subroutines to detect and exclude Byzantine
updates.
Reputation-driven and Adaptive Weighting Mechanisms: Reputation-based approaches
assign weights dynamically based on client trustworthiness. Bayesian reputation-based aggrega-
tion [106] adjusts client influence using reputation scores derived from prior performance. Simi-
larly, FedTop [233] weights client updates based on their ranking and contributions, while FedNAT
[236] uses Wasserstein distance scoring to reward high-performing clients. The Layered-Switch
aggregation in Reference [168] adds further robustness by randomizing weights across aggrega-
tion layers, increasing the difficulty of adversarial targeting. Reference [262] integrates robust nor-
mal aggregation with probabilistic perturbations and personalized model fusion, adapting weights
based on individual client performance. Reference [105] combines robust mean-based aggregation
with decentralized anomaly detection, adapting weights to counteract attacks such as poisoning
and label-flipping. Reference [280] enhances Byzantine resilience by using secure cosine similar-
ity and incentive-based weighting, rewarding clients who contribute high-quality, trustworthy
updates.
Byzantine-resilient Aggregation: Byzantine-resilient methods protect against coordinated
attacks by ensuring that only non-malicious updates are included in the aggregation. Median-
based and trimmed-mean-based defenses [199] offload Byzantine filtering to network switches,
increasing efficiency in handling large traffic volumes. The Krum approach [15] selects a single
update as the most representative for the group, reducing the risk of contamination from malicious
updates. Additionally, Convolutional Kernel Angle-based Defense Aggregation in Reference [320]
applies cosine similarity to detect malicious gradients, making it effective for high-dimensional
models under attack. Reference [135] uses a trimmed padding mean to calculate the median of
parameters, minimizing the influence of Byzantine clients by averaging only benign client updates.
Reference [203] compares Byzantine-tolerant methods such as Krum, Multi-Krum, RFA, and Norm-
Difference Clipping to handle backdoor model poisoning. Reference [90] applies the FL Trusted
Coordinates (FLTC) approach, selecting trusted coordinates based on direction and magnitude
to defend against Byzantine updates. Reference [50] explores the use of FedAvg, FedProx, and
Scaffold in non-IID settings, applying robust techniques for adversarial and Byzantine resilience.
Probabilistic and Randomized Aggregation Techniques: Probabilistic and randomized ap-
proaches add diversity to aggregation, reducing the likelihood of adversarial influence. Random
noise and gradient filtering [254] use noise injection to prevent malicious clients from colluding.

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:22 Md P. Uddin et al.

Methods such as Auto-Weighted Geometric Median (AutoGM) [116] use probabilistic gradi-
ent assignments to balance between global and personalized models. FedNAT [236] integrates
attention refinement and probabilistic scoring, enhancing update validity through random pertur-
bations. Reference [206] introduces Zeno and coordinate-wise median aggregation, where clients
are probabilistically ranked based on descent predictions in loss function, and model weights are
combined by median for robustness. Historical Information and Memory-based Aggregation:
Historical data is leveraged in some studies to identify and exclude unreliable updates based on
past trends. Historical Information-based Recovery in Reference [19] estimates client updates from
previous contributions, allowing for periodic correction and stability checks. Similarly, Weiszfeld
aggregation [218] applies memory-based tuning to adapt aggregation thresholds based on histori-
cal gradient patterns, providing resilience over successive FL rounds. Reference [160] uses tempo-
ral divergence and client history to adjust weights dynamically, filtering out suspicious updates
and minimizing poisoning impact. Reference [105] integrates anomaly detection with a memory-
based mechanism, creating a decentralized record to support robust mean aggregation even in the
presence of Byzantine clients.
Non-linear and Advanced Mathematical Models for Aggregation: More advanced math-
ematical approaches are adopted to process gradients in non-linear or complex forms. Selective
Maximum Correntropy Aggregation (MCA) [150] applies fixed-point iteration for optimal
aggregation, while Enhanced Weiszfeld aggregation [218] uses CNNs to refine feature selection
from image data. bMOM [256] and attention refinement with Wasserstein distance [236] further
enhance aggregation by combining advanced non-linear estimations and selective weighting. Ref-
erence [2] uses the Hessian matrix as a scaling mechanism, incorporating it into the aggregation
process to handle data quality variations across models. This approach, inspired by Quasi-Newton
methods, ensures robustness by adapting the aggregation process to different data qualities, con-
tributing to a more precise model aggregation. We provide a comprehensive comparative unit
analysis indicating the potential gaps of these approaches in Tables 17–19.
5.2.3 Aggregation Hyperparameter Tuning. The theme of hyperparameter tuning during model
aggregation in FL encompasses 52 studies, each introducing unique techniques to optimize the
aggregation process, enhancing FL’s robustness. These approaches ensure that contributions from
different clients are combined optimally, considering factors such as update quality, client reliabil-
ity, and resistance to noise or adversarial attacks.
Trust and Reputation-based Tuning: Trust-based mechanisms assign weights to clients
based on historical reliability or contributions. For instance, trust scores in Reference [237] are
estimated from discrepancies in penultimate layer representations, while reputation scores in Ref-
erence [267] rely on a Bayesian framework for client utility estimation. Studies like FLEvaluate
[62] apply fuzzy evaluation for trust-based aggregation, while ShapleyFL [219] uses Shapley value
calculations to weigh client contributions adaptively. Trust-based model updates are refined in
FedValidate [321], which validates client credibility before aggregation. Additionally, trustworthy
model updates [318] leverage local and global update evaluations in industrial cyber-physical sys-
tems, addressing free-riders and poisoning attacks. Reference [11] uses cosine similarity to gauge
client alignment, adjusting learning rates and calculating reputation scores to select reliable local
models. Reference [118] implements an auto-weighted FL approach that adjusts client weights ac-
cording to empirical risk, penalizing clients with low performance, while Reference [237] estimates
trust scores based on penultimate layer representations, emphasizing alignment with trustworthy
clients to ensure robust updates in aggregation. Reference [208] uses reputation-based weighting of
gradients calculation and uses flip-score to detect malicious clients, but this work has dependency
on the assumption that gradient flips are rare in benign conditions and may not cover scenarios
where benign clients exhibit similar behavior.
ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:23

Weighted and Dynamic Aggregation: Studies focusing on weighted aggregation use dy-
namic, context-sensitive weights to improve robustness. For instance, contribution-wise weighted
aggregation in Reference [125] assigns scores based on client contributions and uses a trimmed-
normalized aggregation rule to counter scaling attacks. Cosine similarity-based weighting in Ref-
erence [139] measures both Euclidean distance and cosine similarity, refining client selection.
Weighted aggregation using final-layer adjustments [270] identifies malicious models by embed-
ding backdoor triggers, while FL-GQM [139] uses distributed scoring to increase resistance to ad-
versarial clients. Reference [108] prioritizes clients with higher rewards, reducing cooperation with
lower-reward clients for network efficiency. Reference [110] applies Shapley values to measure
sample contributions, re-weighting updates based on individual data relevance, while Reference
[96] leverages k-reliable neighbors to obtain clean and similar data for reliable training, dynam-
ically adjusting weights based on data quality and relevance. Reference [114] classifies DNN pa-
rameters into critical and noncritical, applies different update rules, and uses weighted aggregation
based on learning efficiency but has potential dependency on accurate classification of parameters
and may not perform well in cases with extreme noise. Reference [309] utilizes existing robust ag-
gregation rules like trimmed mean. Reference [83] uses hierarchical fusion of heterogeneous mod-
els to mitigate noise; they employ a heteroArc fusion layer for aggregating heterogeneous models.
Clustering and Group-based Hyperparameter Tuning: Several studies use clustering to
separate benign from malicious updates. Agglomerative Hierarchical Clustering (AHC) in
Reference [122] leverages clustering techniques to enhance Byzantine robustness. Mini-FL [129]
clusters clients by non-IID features (e.g., geographic, temporal) for grouped model aggregation,
while two-stage clustering and screening in Reference [124] uses centerpoint aggregation, combin-
ing clustering with discrete geometry to handle large-scale models. Clustering-based Byzantine-
robust aggregation is further explored in Reference [175], which applies coordinate-wise screening
alongside centerpoint aggregation. Reference [20] clusters clients into groups for group-wise ag-
gregation, training separate global models and using majority voting to classify test inputs. Refer-
ence [112] uses metrics learning combined with unsupervised clustering (K-means) to detect and
filter malicious data based on learned representations. Similarly, Reference [277] clusters model
parameters, performing robust aggregation within each cluster to ensure resilience against Byzan-
tine and adversarial attacks. Reference [60] determines reconstruction errors of updates using a
variational autoencoder, discarding updates with excessive errors. Game-theoretic and Anom-
aly Detection Tuning: Game-theoretic approaches identify and discard unreliable clients. For in-
stance, game-theoretic filtering in Reference [221] detects malicious updates, while anomaly scores
in Reference [281] and RONI/TRIM in Reference [45] identify updates with high error rates or
inconsistencies. Homomorphic encryption and secret sharing [71] strengthen the privacy and ro-
bustness of client updates, while Moreau envelope smoothing [278] mitigates noisy updates using
an envelope-smoothed approximation. Reference [131] employs spatial-temporal pattern analysis
to detect anomalous gradients, filtering them through cosine similarity and clustering. Reference
[77] uses isolated forest-based anomaly detection by building decision trees and assigning anomaly
scores to client updates for malicious update trimming. Reference [193] applies a supervised anom-
aly detector to filter harmful updates, distinguishing benign from malicious updates to ensure a
cleaner model aggregation process.
Privacy-enhanced Tuning Using Encryption: Encryption-based tuning strategies provide
robustness against attacks by obscuring updates. For example, distributed Paillier encryption with
zero-knowledge proofs [155] ensures privacy and Byzantine resistance. Studies like Mask-NFRT
[71] employ secret sharing for dropout tolerance, while double-masking techniques [149] secure
client updates with pseudo-random generator-based encryption. Privacy is further enhanced in dif-
ferential privacy frameworks such as DPFA and DPLoss [51], which implement differential privacy

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:24 Md P. Uddin et al.

within aggregation. Reference [53] uses a verification dataset and confidence scores to identify ma-
licious clients, enhancing aggregation security while requiring a private shared dataset. Reference
[268] maintains uniform decision boundaries by exchanging class-wise centroids between clients
and the server, synchronizing local model features for consistency despite noisy labels, though this
approach requires sharing some client data features with the server. Trust Metrics and Consen-
sus Mechanisms: Some approaches rely on trust metrics and consensus to filter malicious updates.
Consensus mechanisms in Reference [18] align global models based on majority voting, and alarm-
based anomaly detectors [64] detect poisoned updates for exclusion. Client trust and consensus
with geometric median [38] uses Hamming distance with a clean dataset to bootstrap trust scores
and support secure aggregation. Reference [188] evaluates client reliability by assessing param-
eter alignment with the server’s values, allowing the exclusion of clients with weak alignment.
Reference [214] uses historical Euclidean distances to differentiate between benign and malicious
updates, assuming that benign clients have more consistent update distances, which guides the
server in selecting trustworthy updates.
Adaptive and Sensitivity-based Tuning: Adaptive methods adjust model hyperparameters
based on sensitivity to attack parameters or specific client behaviors. For example, adaptive di-
vergence scoring [5] uses divergence and outlier scores to adjust weighted averaging. Sensitivity
analysis-based Krum [99] estimates attacker count to adapt Krum’s aggregation approach, ensur-
ing robust aggregation even with Byzantine adversaries. Reliability-based weighting [43] applies
a mixture of experts to score updates adaptively based on model quality. Reference [212] adapts
privacy budgets based on Wasserstein distance, generating uncertainty sets that account for indi-
vidual client variability and optimizing model performance. Reference [207] uses a tracking-based
stochastic approximation to offset noise impacts in over-the-air transmissions, adapting the model
for robust convergence under noisy channel conditions, particularly useful in decentralized net-
works with transmission constraints.
Cryptographic Protocols for Tuning: Cryptographic protocols support robust tuning in high-
security FL setups. Additive Secret Sharing and Oblivious Transfer in Reference [149] leverage
cryptographic measures for aggregation, while Zero-knowledge Proofs with Paillier encryption
[155] ensure privacy and defend against Byzantine attacks. Federated Aggregation with Homomor-
phic Encryption [126] applies encryption to protect against model poisoning attacks, particularly
for non-IID data setups. Reference [26] employs hierarchical aggregation with sub-aggregators,
where users send split parameters to sub-aggregators before the final consolidation, boosting pri-
vacy in FL. Reference [241] addresses communication quality by jointly allocating bandwidth and
quantization bits, reducing transmission errors and ensuring reliable updates. Reference [216] as-
signs transmission time slots to client groups and applies robust aggregation to mitigate Byzantine
attacks, promoting secure aggregation across heterogeneous communication channels. To summa-
rize, we provide a comprehensive comparative unit analysis indicating the potential gaps of these
approaches in Tables 20–23.

6 Open Issues and Potential Solutions


This section identifies key challenges and outlines potential solutions in robust FL. The aim is to
address the reviewer’s suggestion by offering detailed insights and deriving new directions based
on the systematic analysis in this survey, incorporating specific issues, solutions, and technical
details.

6.1 Advancing Robust and Secure Model Aggregation Algorithms


Traditional aggregation techniques like FedAvg inadequately handle the multifaceted challenges
of data heterogeneity, malicious updates, and adversarial attacks in FL. Current solutions often

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:25

specialize in addressing isolated issues; for instance, robust aggregation methods such as Krum
and Trimmed Mean focus on Byzantine robustness but may fail under non-IID data distributions.
To provide comprehensive security, advanced aggregation algorithms must incorporate multi-
dimensional strategies. One promising direction is the development of adaptive weighting schemes
that dynamically adjust client contributions based on trust metrics derived from client behavior,
such as update consistency, loss convergence patterns, or model divergence. For example, methods
like coordinate-wise median aggregation can mitigate the impact of outliers but may be compu-
tationally intensive in high-dimensional models. Integrating anomaly detection mechanisms into
the aggregation process can help identify and exclude malicious updates. Future work should focus
on scalable algorithms that minimize computational overhead while effectively balancing robust-
ness and accuracy, possibly through techniques such as gradient sparsification or dimensionality
reduction in the aggregation process. Additionally, secure multi-party computation (SMPC)
protocols can be employed to securely compute aggregates without exposing individual updates,
enhancing both security and privacy.

6.2 Integrating Hybrid Server-client Defense Mechanisms


The current focus on either server-side (e.g., robust aggregation) or client-side (e.g., local differ-
ential privacy) defenses in FL creates vulnerabilities exploitable by sophisticated attackers who
can circumvent isolated defenses. Hybrid approaches that integrate both server and client per-
spectives are critical for establishing a resilient FL framework. For instance, implementing se-
cure aggregation protocols like Bonawitz et al.’s Secure Aggregation allows clients to jointly
compute aggregates without revealing individual updates, while the server can apply anom-
aly detection on aggregated results using statistical tests or machine learning classifiers. Tech-
niques like Federated Adversarial Training (FAT), where both clients and the server partici-
pate in generating and defending against adversarial examples, can enhance robustness against
distributed attacks. Additionally, feedback loops between the server and clients, where the server
provides feedback on the quality or trustworthiness of client updates (e.g., through reputation
scores or trust metrics), can enable clients to adjust their local training strategies dynamically.
This co-design of detection algorithms leveraging both global and local information can im-
prove the system’s ability to identify and mitigate threats, such as backdoor attacks or model
poisoning.

6.3 Harnessing Ensemble Learning Techniques for Federated Robustness


Ensemble learning can effectively address the diversity of challenges faced in FL by integrating
multiple robust techniques into a cohesive framework. Beyond traditional majority voting or aver-
aging, ensemble-based approaches in FL can leverage advanced methods like Federated Stacking,
where a global meta-model learns from the outputs of diverse client models to enhance resilience
against attacks. For example, each client could train a local model with different architectures
or hyperparameters, and the server could combine their predictions using weighted ensembling
based on performance metrics such as accuracy or loss. By combining diverse defense mechanisms
tailored to attack types and data distributions, ensemble learning improves robustness while re-
ducing reliance on any single strategy. Additionally, incorporating techniques like bagging and
boosting in the federated context can help mitigate overfitting and improve generalization. The
integration of model diversity metrics during aggregation, such as measuring cosine similarity
or kernel alignment between updates, can further bolster FL defenses against adversarial attacks
and heterogeneous data challenges, ensuring that the ensemble maintains high performance even
under attack.

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:26 Md P. Uddin et al.

6.4 Developing Adaptive Mechanisms for Dynamic Threats in FL


FL systems operate in dynamic environments where threats evolve rapidly, and static defense
mechanisms may struggle to adapt to such changes. Incorporating adaptive strategies that respond
to observed behavior and performance metrics is essential. Adaptive learning rates for client train-
ing, adjusted based on local loss convergence or gradient variance, can help maintain stability and
robustness. Techniques like Reinforcement Learning (RL) can be employed to dynamically ad-
just hyperparameters such as the number of local epochs, batch size, or regularization strength
in response to changing threat levels. For example, if a client’s updates deviate significantly from
the global model, then the system could reduce that client’s influence or increase scrutiny of its
updates through adaptive weighting or outlier detection algorithms like Local Outlier Factor
(LOF). Adaptive client selection mechanisms can prioritize clients with more reliable updates or
diverse data, improving the robustness and fairness of the aggregated model. Furthermore, imple-
menting online anomaly detection algorithms that continuously monitor model updates for signs
of poisoning or backdoor attacks can enable timely responses to evolving adversarial threats, uti-
lizing techniques such as Principal Component Analysis (PCA) for dimensionality reduction
and anomaly detection.

6.5 Designing Lightweight and Efficient Encryption Protocols for FL


Integrating encryption into FL to ensure security and privacy often introduces significant com-
putational overhead, particularly in resource-constrained edge environments. To address this,
lightweight symmetric encryption schemes like stream ciphers (e.g., ChaCha20, Salsa20) can be
employed for encrypting model updates, reducing computational load compared to more heavy-
weight asymmetric schemes. Additionally, leveraging hardware acceleration available on modern
devices (e.g., ARM processors with NEON SIMD instructions) can enhance cryptographic oper-
ations’ efficiency. Exploring hybrid encryption schemes that combine symmetric encryption for
data transmission and homomorphic encryption for secure computation can balance security and
performance. For instance, partially homomorphic encryption schemes such as Paillier or Batched
RSA can enable secure aggregation of encrypted model updates without decrypting them, although
they may be limited to additive operations. Advancements in efficient SMPC protocols, such as Se-
cret Sharing schemes (e.g., Shamir’s Secret Sharing), can facilitate secure aggregation with lower
computational overhead. The development of these lightweight encryption protocols is instru-
mental in making FL scalable and practical in real-world deployments where clients have limited
computational resources and energy constraints.

6.6 Enhancing Real-time and Online Robustness in FL


FL’s distributed nature and asynchronous operation make it susceptible to adversarial attacks and
concept drift, especially in scenarios where data distributions change over time (non-stationary
environments). To enhance real-time robustness, implementing asynchronous model aggregation
schemes like FedAsync can reduce delays associated with synchronous updates and allow faster
integration of client updates. However, asynchronous updates introduce challenges in maintaining
model consistency and convergence, which can be addressed using staleness-aware aggregation
methods that weigh updates based on their freshness. Hybrid adversarial detection mechanisms
that combine real-time monitoring with predictive analytics can counter threats effectively. For
example, real-time anomaly detection algorithms can flag suspicious updates based on statistical
deviations using techniques like the Sequential Probability Ratio Test (SPRT), while predic-
tive models using Recurrent Neural Networks (RNNs) or Long Short-Term Memory (LSTM)
networks anticipate potential attacks based on historical patterns. Incorporating online learning

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:27

methods, such as Online Gradient Descent or adaptive forgetting factors, can help models adapt to
concept drift by giving more weight to recent data. Techniques like Elastic Weight Consolida-
tion (EWC) can prevent catastrophic forgetting when adapting to new data distributions. These
strategies improve robustness against evolving data distributions and adversarial patterns, ensur-
ing that the FL system remains effective over time.

6.7 Balancing Performance, Security, and Fairness in FL


Tradeoffs between robustness, accuracy, and fairness remain a critical challenge in FL. For instance,
employing strong differential privacy mechanisms enhances privacy but can degrade model accu-
racy due to added noise. To address this, multi-objective optimization frameworks that explicitly
account for these tradeoffs are needed. Techniques like Pareto Optimality can help find solutions
that balance competing objectives. RL can be used to dynamically adjust defense mechanisms
and hyperparameters based on their impact on performance metrics. For example, an RL agent
could learn to adjust the level of noise added for privacy based on the current model accuracy
and fairness metrics. Personalized FL approaches, where models are tailored to individual client
data distributions and requirements, can reduce disparities while maintaining overall robustness.
Federated Multi-Task Learning (FMTL) allows different clients to have models customized to
their specific tasks, improving fairness and potential robustness. Additionally, fairness-aware FL
algorithms that incorporate fairness constraints into the optimization objective can help ensure
equitable performance across clients. Techniques like Adaptive Weighting can balance the con-
tributions from different clients based on data quality, contribution to model performance, and
adherence to fairness criteria.

6.8 Evolving FL Architectures for Scalability and Efficiency


The traditional star-based architecture of FL, where a central server coordinates all clients, is in-
creasingly inadequate for large-scale deployments with diverse clients due to communication bot-
tlenecks and single points of failure. Decentralized architectures, such as peer-to-peer networks
or gossip-based protocols, can reduce reliance on a central server and enhance scalability. For in-
stance, Decentralized FL (DFL) allows clients to communicate and aggregate models with their
neighbors in a network graph using algorithms such as Push-Sum or Gossip Averaging. Hierar-
chical FL architectures introduce intermediate aggregation layers, where edge servers or regional
aggregators collect updates from nearby clients before passing them to the central server. This ap-
proach reduces communication overhead and can handle data heterogeneity more effectively by
grouping clients with similar data distributions. Blockchain-enabled FL frameworks offer a novel
approach to improving transparency and trustworthiness in data aggregation and transmission.
Smart contracts can automate and secure the aggregation process, while consensus mechanisms
such as Proof of Stake (PoS) or Practical Byzantine Fault Tolerance (PBFT) can ensure the in-
tegrity of updates. These architectural innovations are critical for adapting FL systems to complex
and large-scale applications, but they also introduce new challenges in terms of latency, security,
and protocol design that must be addressed.

6.9 Achieving Cross-domain Robustness in FL


Cross-domain FL faces significant challenges due to differences in data characteristics, feature
spaces, label distributions, and security requirements across domains. Transfer learning tech-
niques, such as fine-tuning a pre-trained model on new domain-specific data, can enhance
cross-domain robustness. In FL, this could involve clients in a new domain adapting the global
model by fine-tuning their local data while contributing to the global update. Meta-learning
approaches, such as Model-Agnostic Meta-Learning (MAML), where the model is trained to be

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:28 Md P. Uddin et al.

easily adaptable to new tasks, can be effective in enabling models to quickly adapt to new domains
with minimal data. Joint learning frameworks that facilitate collaboration across domains without
compromising privacy and security, such as Federated Transfer Learning (FTL), represent a
promising direction for future research. FTL can leverage knowledge from related domains even
when there is little overlap in data features or labels. Techniques such as domain adaptation and
adversarial learning can help align feature spaces across domains, improving the generalization
of the global model. Additionally, incorporating multi-view learning or multi-modal learning can
further enhance the model’s ability to handle heterogeneous data from different domains.

6.10 Providing Theoretical Guarantees for Robust FL


Despite empirical advancements, robust FL lacks comprehensive theoretical foundations to vali-
date its effectiveness under diverse scenarios. Rigorous definitions of robustness metrics, encom-
passing both data heterogeneity and adversarial resilience, are essential. Establishing theoretical
bounds for attack resilience involves analyzing the convergence properties of FL algorithms in the
presence of malicious clients and quantifying the maximum allowable proportion of adversaries
for which convergence can still be guaranteed (Byzantine fault tolerance thresholds). Additionally,
deriving convergence rates under non-IID data distributions and asynchronous updates is critical.
Developing certified verification frameworks for robustness, akin to formal methods in software
verification, can ensure reliable deployment of FL systems in critical applications. Techniques from
robust statistics, such as influence functions and robustness measures (e.g., breakdown point, in-
fluence curve), can be adapted to analyze the robustness of FL algorithms. Formalizing privacy
guarantees through differential privacy and understanding their impact on convergence and ro-
bustness is also important. Furthermore, applying probabilistic analysis and concentration inequal-
ities (e.g., Hoeffding’s inequality and Bernstein’s inequality) can provide insights into the behav-
ior of FL algorithms under uncertainty. These theoretical foundations will provide a quantitative
benchmark for evaluating FL methods and will help in designing algorithms with provable security
and performance guarantees, essential for critical applications such as healthcare and autonomous
systems.

7 Conclusion
This article has presented a comprehensive and insightful exploration into the realm of robust
FL through an SLR guided by the PRISMA framework. The review’s meticulous analysis has un-
veiled a diverse array of approaches and methodologies dedicated to enhancing the resilience of
FL models in the face of adversarial challenges, noise, and the complexities of decentralized data.
The identified eight themes—ranging from innovative objective regularization and optimizer mod-
ification strategies to the implementation of differential privacy, client selection, new aggregation
algorithms, and meticulous hyperparameter tuning during model aggregation—collectively depict
the breadth and depth of strategies employed by researchers to ensure the robustness of FL. By
synthesizing these themes, this review not only encapsulates the present state-of-the-art but also
paves the way for the future of research in the domain.
This SLR also points towards promising directions for further investigation. It underscores
the potential of hybrid models, ensemble methods, and adaptive mechanisms to bolster the re-
silience of FL systems even further. The convergence of these methods, combined with advance-
ments in data privacy and security, has the potential to address the complex challenges posed
by adversaries and heterogeneity in decentralized data sources. As the FL landscape evolves
rapidly, this review stands as an invaluable resource, guiding researchers, practitioners, and pol-
icymakers toward developing robust FL solutions. The insights derived from this analysis can

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:29

serve as a foundational framework for the creation of more secure and reliable FL systems, ulti-
mately shaping the future landscape of decentralized ML in the pursuit of privacy, security, and
performance.

References
[1] Amina Adadi. 2021. A survey on data-efficient algorithms in big data era. J. Big Data 8, 1 (2021), 24.
[2] Adnan Ahmad, Wei Luo, and Antonio Robles-Kelly. 2023. Robust federated learning under statistical heterogene-
ity via Hessian spectral decomposition. Pattern Recog. 141 (2023), 109635. DOI:[Link]
109635
[3] Wesam Salah Alaloul, M. S. Liew, Noor Amila Wan Abdullah Zawawi, and Ickx Baldwin Kennedy. 2020. Industrial
revolution 4.0 in the construction industry: Challenges and opportunities for stakeholders. Ain Shams Eng. J. 11, 1
(2020), 225–230.
[4] Ebtisaam Alharbi, Leandro Soriano Marcolino, Antonios Gouglidis, and Qiang Ni. 2023. Robust federated learning
method against data and model poisoning attacks with heterogeneous data distribution. In ECAI 2023. IOS Press,
85–92.
[5] Naif Alkhunaizi, Dmitry Kamzolov, Martin Takáč, and Karthik Nandakumar. 2022. Suppressing poisoning attacks on
federated learning for medical imaging. In Medical Image Computing and Computer Assisted Intervention – MICCAI
2022, Linwei Wang, Qi Dou, P. Thomas Fletcher, Stefanie Speidel, and Shuo Li (Eds.). Springer Nature Switzerland,
Cham, 673–683.
[6] Suzan Almutairi and Ahmed Barnawi. 2023. Federated learning vulnerabilities, threats and defenses: A systematic
review and future directions. Internet Things 24 (2023), 100947.
[7] Fan Ang, Li Chen, and Weidong Wang. 2020. Robust federated learning under worst-case model. In IEEE Wire-
less Communications and Networking Conference (WCNC’20). 1–6. DOI:[Link]
9120713
[8] Fan Ang, Li Chen, Nan Zhao, Yunfei Chen, Weidong Wang, and F. Richard Yu. 2020. Robust federated learning
with noisy communication. IEEE Trans. Commun. 68, 6 (2020), 3452–3464. DOI:[Link]
2979149
[9] Angelos Angelopoulos, Emmanouel T. Michailidis, Nikolaos Nomikos, Panagiotis Trakadas, Antonis Hatziefremidis,
Stamatis Voliotis, and Theodore Zahariadis. 2019. Tackling faults in the Industry 4.0 era—A survey of machine-
learning solutions and key aspects. Sensors 20, 1 (2019), 109.
[10] Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, and Jorge Luis Reyes-Ortiz. 2013. A public domain
dataset for human activity recognition using smartphones. In European Symposium on Artificial Neural Networks,
Computational Intelligence and Machine Learning.
[11] Sana Awan, Bo Luo, and Fengjun Li. 2021. CONTRA: Defending against poisoning attacks in federated learning. In
Computer Security – ESORICS 2021, Elisa Bertino, Haya Shulman, and Michael Waidner (Eds.). Springer International
Publishing, Cham, 455–475.
[12] Chunguang Bai, Patrick Dallasega, Guido Orzes, and Joseph Sarkis. 2020. Industry 4.0 technologies assessment: A
sustainability perspective. Int. J. Product. Econ. 229 (2020), 107776.
[13] Jeeyun Sophia Baik. 2020. Data privacy against innovation or against discrimination?: The case of the California
Consumer Privacy Act (CCPA). Telemat. Inform. 52 (2020), 101431.
[14] Wenxuan Bao, Jun Wu, and Jingrui He. 2024. BOBA: Byzantine-robust federated learning with label skewness. In
International Conference on Artificial Intelligence and Statistics. PMLR, 892–900.
[15] Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. 2017. Machine learning with adversaries:
Byzantine tolerant gradient descent. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg,
S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. Retrieved
from [Link]
[16] Du Bowen, Wang Haiquan, Li Yuxuan, Jiejie Zhao, Yanbo Ma, and Huang Runhe. 2024. Fair and robust federated
learning via decentralized and adaptive aggregation based on blockchain. ACM Trans. Sensor Netw. (2024), 1–25.
[17] Xiaoyu Cao, Minghong Fang, Jia Liu, and Neil Zhenqiang Gong. 2022. FLTrust: Byzantine-robust federated learning
via trust bootstrapping. arXiv:[Link]/2012.13995
[18] Xiaoyu Cao, Jinyuan Jia, and Neil Zhenqiang Gong. 2021. Provably secure federated learning against malicious clients.
In AAAI Conference on Artificial Intelligence, Vol. 35. 6885–6893.
[19] Xiaoyu Cao, Jinyuan Jia, Zaixi Zhang, and Neil Zhenqiang Gong. 2023. FedRecover: Recovering from poisoning
attacks in federated learning using historical information. In IEEE Symposium on Security and Privacy (SP’23). IEEE,
1366–1383.

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:30 Md P. Uddin et al.

[20] Xiaoyu Cao, Zaixi Zhang, Jinyuan Jia, and Neil Zhenqiang Gong. 2022. FLCert: Provably secure federated learning
against poisoning attacks. IEEE Trans. Inf. Forens. Secur. 17 (2022), 3691–3705. DOI:[Link]
3212174
[21] Giuseppe Carleo, Ignacio Cirac, Kyle Cranmer, Laurent Daudet, Maria Schuld, Naftali Tishby, Leslie Vogt-Maranto,
and Lenka Zdeborová. 2019. Machine learning and the physical sciences. Rev. Mod. Phys. 91, 4 (2019), 045002.
[22] Ferhat Ozgur Catak and Murat Kuzlu. 2024. A federated adversarial learning approach for robust spectrum sensing.
In 13th Mediterranean Conference on Embedded Computing (MECO’24). IEEE, 1–4.
[23] Rishikanth Chandrasekaran, Kazim Ergun, Jihyun Lee, Dhanush Nanjunda, Jaeyoung Kang, and Tajana Rosing.
2022. FHDnn: Communication efficient and robust federated learning for AIoT networks. In 59th ACM/IEEE De-
sign Automation Conference (DAC’22). Association for Computing Machinery, New York, NY, USA, 37–42. DOI:https:
//[Link]/10.1145/3489517.3530394
[24] Melvin Chelli, Cédric Prigent, René Schubotz, Alexandru Costan, Gabriel Antoniu, Loïc Cudennec, and Philipp
Slusallek. 2023. FedGuard: Selective parameter aggregation for poisoning attack mitigation in federated learning.
In IEEE International Conference on Cluster Computing (CLUSTER’23). IEEE, 72–81.
[25] Hao Chen, Xixiang Lv, and Wei Zheng. 2023. AFL: Attention-based Byzantine-robust federated learning with vector
filter. In IEEE Global Communications Conference (GLOBECOM’23). IEEE, 595–600.
[26] Junbao Chen, Jingfeng Xue, Yong Wang, Lu Huang, Thar Baker, and Zhixiong Zhou. 2023. Privacy-preserving and
traceable federated learning for data sharing in industrial IoT applications. Expert Syst. Applic. 213 (2023), 119036.
DOI:[Link]
[27] Li Chen, Fan Ang, Yunfei Chen, and Weidong Wang. 2023. Robust federated learning with noisy labeled data through
loss function correction. IEEE Trans. Netw. Sci. Eng. 10, 3 (2023), 1501–1511. DOI:[Link]
3227287
[28] Mingzhe Chen, Deniz Gündüz, Kaibin Huang, Walid Saad, Mehdi Bennis, Aneta Vulgarakis Feljan, and H. Vincent
Poor. 2021. Distributed learning in wireless networks: Recent progress and future challenges. IEEE J. Select. Areas
Commun. 39, 12 (2021), 3579–3605.
[29] Yiqiang Chen, Xin Qin, Jindong Wang, Chaohui Yu, and Wen Gao. 2020. FedHealth: A federated transfer learning
framework for wearable healthcare. IEEE Intell. Syst. 35, 4 (2020), 83–93.
[30] Zhixiong Chen, Wenqiang Yi, Yuanwei Liu, and Arumugam Nallanathan. 2024. Robust federated learning for unre-
liable and resource-limited wireless networks. IEEE Trans. Wirel. Commun. 23, 8 (2024), 9793–9809.
[31] Long Cheng, Fang Liu, and Danfeng Yao. 2017. Enterprise data breach: Causes, challenges, prevention, and future
directions. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 7, 5 (2017), e1211.
[32] Tianyue Chu and Nikolaos Laoutaris. 2024. FedQV: Leveraging quadratic voting in federated learning. ACM Measur.
Anal. Comput. Syst. 8, 2 (2024), 1–36.
[33] Francesco Colosimo and Floriano De Rango. 2024. Byzantine-robust federated learning based on dynamic gradient
filtering. In International Wireless Communications and Mobile Computing (IWCMC’24). IEEE, 1062–1067.
[34] Francesco Colosimo and Floriano De Rango. 2024. Dynamic gradient filtering in federated learning with Byzantine
failure robustness. Fut. Gen. Comput. Syst. 160 (2024), 784–797.
[35] Shaveta Dargan, Munish Kumar, Maruthi Rohit Ayyagari, and Gulshan Kumar. 2020. A survey of deep learning and
its applications: A new paradigm to machine learning. Archiv. Computat. Meth. Eng. 27 (2020), 1071–1092.
[36] Caiqin Dong, Jian Weng, Ming Li, Jia-Nan Liu, Zhiquan Liu, Yudan Cheng, and Shui Yu. 2023. Privacy-preserving
and Byzantine-robust federated learning. IEEE Trans. Depend. Secure Comput. 21, 2 (2023), 889–904.
[37] Shi Dong, Ping Wang, and Khushnood Abbas. 2021. A survey on deep learning and its applications. Comput. Sci. Rev.
40 (2021), 100379.
[38] Ye Dong, Xiaojun Chen, Kaiyun Li, Dakui Wang, and Shuai Zeng. 2021. FLOD: Oblivious defender for private
Byzantine-robust federated learning with dishonest-majority. In Computer Security – ESORICS 2021, Elisa Bertino,
Haya Shulman, and Michael Waidner (Eds.). Springer International Publishing, Cham, 497–518.
[39] Mengyao Du, Miao Zhang, Lin Liu, Kai Xu, and Quanjun Yin. 2023. Credit-based differential privacy stochastic
model aggregation algorithm for robust federated learning via blockchain. In 52nd International Conference on Parallel
Processing. 452–461.
[40] Fatima Elhattab, Sara Bouchenak, Rania Talbi, and Vlad Nitu. 2023. Robust federated learning for ubiquitous com-
puting through mitigation of edge-case backdoor attacks. Proc. ACM Interact. Mob. Wear. Ubiq. Technol. 6, 4, Article
162 (Jan. 2023), 27 pages. DOI:[Link]
[41] M. Sakib Osman Eshan, Md Naimul Huda Nafi, Nazmus Sakib, Mehedi Hasan Emon, Tanzim Reza, Mohammad Za-
vid Parvez, Prabal Datta Barua, and Subrata Chakraborty. 2023. Byzantine-resilient federated learning leveraging
confidence score to identify retinal disease. In International Conference on Digital Image Computing: Techniques and
Applications (DICTA’23). IEEE, 81–88.

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:31

[42] Shuming Fan, Chenpei Wang, Xinyu Ruan, Hongjian Shi, Ruhui Ma, and Haibing Guan. 2024. FedREAS: A robust
efficient aggregation and selection framework for federated learning. ACM Trans. Asian Low-resour. Lang. Inf. Process.
(2024), 1–26.
[43] Xiangyu Fan, Zheyuan Shen, Wei Fan, Keke Yang, and Jing Li. 2023. Byzantine-robust federated learning via server-
side mixture of experts. In Pacific Rim International Conference on Artificial Intelligence. Springer, 41–52.
[44] Xin Fan, Yue Wang, Yan Huo, and Zhi Tian. 2022. Best effort voting power control for Byzantine-resilient federated
learning over the air. In IEEE International Conference on Communications Workshops (ICC Workshops’22). 806–811.
DOI:[Link]
[45] Minghong Fang, Xiaoyu Cao, Jinyuan Jia, and Neil Gong. 2020. Local model poisoning attacks to Byzantine-Robust
federated learning. In 29th USENIX Security Symposium (USENIX Security’20). 1605–1622.
[46] Xiuwen Fang and Mang Ye. 2022. Robust federated learning with noisy and heterogeneous clients. In IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR’22). 10072–10081.
[47] Bao Feng, Changyi Ma, Yu liu, Qinghui Hu, Yan Lei, Meiqi Wan, Fan Lin, Jin Cui, Wansheng Long, and Enming
Cui. 2024. Deep learning vs. robust federal learning for distinguishing adrenalmetastases from benign lesions with
multi-phase CT images. Heliyon 10, 3 (2024), 1–11.
[48] Andras Ferenczi and Costin Bădică. 2024. Fully decentralized privacy-enabled federated learning system based on
Byzantine-resilient consensus protocol. Simul. Model. Pract. Theor. 136 (2024), 102987.
[49] Reza Fotohi, Fereidoon Shams Aliee, and Bahar Farahani. 2024. Decentralized and robust privacy-preserving model
using blockchain-enabled federated deep learning in intelligent enterprises. Appl. Soft Comput. 161 (2024), 111764.
[50] Eyad Gad, Zubair Md Fadlullah, and Mostafa M. Fouda. 2024. A robust federated learning approach for combating
attacks against IoT systems under non-IID challenges. In International Conference on Smart Applications, Communi-
cations and Networking (SmartNets’24). IEEE, 1–6.
[51] Xinwen Gao, Shaojing Fu, Lin Liu, and Yuchuan Luo. 2024. BVDFed: Byzantine-resilient and verifiable aggregation
for differentially private federated learning. Front. Comput. Sci. 18, 5 (2024), 185810.
[52] Zhihe Gao, Xiaoming Chen, and Xiaodan Shao. 2022. Robust federated learning for edge-intelligent networks. Sci.
China Inf. Sci. 65, 3 (2022), 132306.
[53] Lina Ge, Xin He, Guanghui Wang, and Junyang Yu. 2021. Chain-AAFL: Chained adversarial-aware federated learning
framework. In Web Information Systems and Applications, Chunxiao Xing, Xiaoming Fu, Yong Zhang, Guigang Zhang,
and Chaolemen Borjigin (Eds.). Springer International Publishing, Cham, 237–248.
[54] Till Gehlhar, Felix Marx, Thomas Schneider, Ajith Suresh, Tobias Wehrle, and Hossein Yalame. 2023. SafeFL: MPC-
friendly framework for private and robust federated learning. In IEEE Security and Privacy Workshops (SPW’23). IEEE,
69–76.
[55] Gangchao Geng, Tianyang Cai, and Zheng Yang. 2023. Better safe than sorry: Constructing Byzantine-robust feder-
ated learning with synthesized trust. Electronics 12, 13 (2023), 2926.
[56] Avishek Ghosh, Justin Hong, Dong Yin, and Kannan Ramchandran. 2019. Robust federated learning in a heteroge-
neous environment. arXiv preprint arXiv:1906.06629 (2019).
[57] Zirui Gong, Liyue Shen, Yanjun Zhang, Leo Yu Zhang, Jingwei Wang, Guangdong Bai, and Yong Xiang. 2023.
AGRAMPLIFIER: Defending federated learning against poisoning attacks through local update amplification. IEEE
Trans. Inf. Forens. Secur. 19 (2023), 1241–1250.
[58] Ala Gouissem, Khalid Abualsaud, Elias Yaacoub, Tamer Khattab, and Mohsen Guizani. 2023. Collaborative Byzantine
resilient federated learning. IEEE Internet Things J. 10, 18 (2023), 15887–15899.
[59] Jürgen Groß. 2003. Linear Regression. Vol. 175. Springer Science & Business Media.
[60] Zhipin Gu, Liangzhong He, Peiyan Li, Peng Sun, Jiangyong Shi, and Yuexiang Yang. 2021. FREPD: A robust federated
learning framework on variational autoencoder. Comput. Syst. Sci. Eng. 39, 3 (2021), 307–320.
[61] Zhipin Gu, Jiangyong Shi, Yuexiang Yang, and Liangzhong He. 2023. Defending against poisoning attacks in feder-
ated learning from a spatial-temporal perspective. In 42nd International Symposium on Reliable Distributed Systems
(SRDS’23). IEEE, 25–34.
[62] Chao Guo, Buxin Guo, Tingting Zhu, Peihe Liu, and Cheng Gong. 2023. FLEvaluate: Robust federated learning based
on trust evaluate. In IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communi-
cations (TrustCom’23). IEEE, 2269–2274.
[63] Hongle Guo, Yingchi Mao, Xiaoming He, Benteng Zhang, Tianfu Pang, and Ping Ping. 2024. Improving federated
learning through abnormal client detection and incentive. Comput. Model. Eng. Sci. 139, 1 (2024), 383–403.
[64] Hanxi Guo, Hao Wang, Tao Song, Yang Hua, Zhangcheng Lv, Xiulang Jin, Zhengui Xue, Ruhui Ma, and Haibing Guan.
2021. Siren: Byzantine-robust federated learning via proactive alarming. In ACM Symposium on Cloud Computing.
47–60.
[65] Hanxi Guo, Hao Wang, Tao Song, Yang Hua Ruhui Ma, Xiulang Jin, Zhengui Xue, and Haibing Guan. 2024. SIREN+:
Robust federated learning with proactive alarming and differential privacy. IEEE Trans. Depend. Secure Comput. 21,
5 (2024), 4843–4860.

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:32 Md P. Uddin et al.

[66] Qi Guo, Di Wu, Yong Qi, Saiyu Qi, and Qian Li. 2022. FLMJR: Improving robustness of federated learning via model
stability. In Computer Security – ESORICS 2022, Vijayalakshmi Atluri, Roberto Di Pietro, Christian D. Jensen, and
Weizhi Meng (Eds.). Springer Nature Switzerland, Cham, 405–424.
[67] Shangwei Guo, Tianwei Zhang, Han Yu, Xiaofei Xie, Lei Ma, Tao Xiang, and Yang Liu. 2022. Byzantine-resilient
decentralized stochastic gradient descent. IEEE Trans. Circ. Syst. Vid. Technol. 32, 6 (2022), 4096–4106. DOI:https:
//[Link]/10.1109/TCSVT.2021.3116976
[68] Zhenyuan Guo, Lei Xu, and Liehuang Zhu. 2023. FedSIGN: A sign-based federated learning framework with privacy
and robustness guarantees. Comput. Secur. 135 (2023), 103474.
[69] Ashish Gupta, Tie Luo, Mao V. Ngo, and Sajal K. Das. 2022. Long-short history of gradients is all you need: Detecting
malicious and unreliable clients in federated learning. In Computer Security – ESORICS 2022, Vijayalakshmi Atluri,
Roberto Di Pietro, Christian D. Jensen, and Weizhi Meng (Eds.). Springer Nature Switzerland, Cham, 445–465.
[70] Manupriya Gupta, Pavas Goyal, Rohit Verma, Rajeev Shorey, and Huzur Saran. 2022. FedFM: Towards a robust fed-
erated learning approach for fault mitigation at the edge nodes. In 14th International Conference on COMmunication
Systems & NETworkS (COMSNETS’22). 362–370. DOI:[Link]
[71] Song Han, Hongxin Ding, Shuai Zhao, Siqi Ren, Zhibo Wang, Jianhong Lin, and Shuhao Zhou. 2023. Practical and
robust federated learning with highly scalable regression training. IEEE Trans. Neural Netw. Learn. Syst. 25, 10 (2023),
13801–13815.
[72] Yufei Han and Xiangliang Zhang. 2020. Robust federated learning via collaborative machine teaching. In AAAI Con-
ference on Artificial Intelligence, Vol. 34. 4075–4082.
[73] Andrew Hard, Kanishka Rao, Rajiv Mathews, Swaroop Ramaswamy, Françoise Beaufays, Sean Augenstein, Hubert
Eichner, Chloé Kiddon, and Daniel Ramage. 2018. Federated learning for mobile keyboard prediction. arXiv preprint
arXiv:1811.03604 (2018).
[74] Junyuan Hong, Haotao Wang, Zhangyang Wang, and Jiayu Zhou. 2023. Federated robustness propagation: Sharing
adversarial robustness in heterogeneous federated learning. In AAAI Conference on Artificial Intelligence, Vol. 37.
7893–7901.
[75] Chris Jay Hoofnagle, Bart Van Der Sloot, and Frederik Zuiderveen Borgesius. 2019. The European Union general
data protection regulation: What it is and what it means. Inf. Commun. Technol. Law 28, 1 (2019), 65–98.
[76] Li Huang, Andrew L. Shea, Huining Qian, Aditya Masurkar, Hao Deng, and Dianbo Liu. 2019. Patient clustering
improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed elec-
tronic medical records. J. Biomed. Inform. 99 (2019), 103291.
[77] Siman Huang, Yan Bai, Zehua Wang, and Peng Liu. 2022. Defending against poisoning attack in federated learn-
ing using isolated forest. In 2nd International Conference on Computer, Control and Robotics (ICCCR’22). 224–229.
DOI:[Link]
[78] Wenke Huang, Mang Ye, Zekun Shi, Guancheng Wan, He Li, Bo Du, and Qiang Yang. 2024. Federated learning for
generalization, robustness, fairness: A survey and benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 46, 12 (2024),
9387–9406.
[79] Yuxian Huang, Geng Yang, Hao Zhou, Hua Dai, Dong Yuan, and Shui Yu. 2024. VPPFL: A verifiable privacy-
preserving federated learning scheme against poisoning attacks. Comput. Secur. 136 (2024), 103562.
[80] Najeeb Moharram Jebreel, Josep Domingo-Ferrer, David Sánchez, and Alberto Blanco-Justicia. 2024. LFighter: De-
fending against the label-flipping attack in federated learning. Neural Netw. 170 (2024), 111–126.
[81] Yanan Jia, Wenxin Li, Mengwei Yan, and Bowen Yan. 2023. Robust federated learning based on chained self-paced
learning. In 3rd International Conference on Computer Science, Electronic Information Engineering and Intelligent Con-
trol Technology (CEI’23). IEEE, 190–193.
[82] Xuefeng Jiang, Sheng Sun, Yuwei Wang, and Min Liu. 2022. Towards federated learning against noisy labels via
local self-regularization. In 31st ACM International Conference on Information & Knowledge Management (CIKM ’22).
Association for Computing Machinery, New York, NY, USA, 862–873. DOI:[Link]
[83] Yalan Jiang, Dan Wang, Bin Song, and Shengyang Luo. 2024. HDHRFL: A hierarchical robust federated learning
framework for dual-heterogeneous and noisy clients. Fut. Gen. Comput. Syst. 160 (2024), 185–196.
[84] Yifeng Jiang, Weiwen Zhang, and Yanxi Chen. 2023. Data quality detection mechanism against label flipping attacks
in federated learning. IEEE Trans. Inf. Forens. Secur. 18 (2023), 1625–1637.
[85] Mahdee Jodayree, Wenbo He, and Ryszard Janicki. 2023. Preventing image data poisoning attacks in federated ma-
chine learning by an encrypted verification key. Proced. Comput. Sci. 225 (2023), 2723–2732.
[86] Michael I. Jordan and Tom M. Mitchell. 2015. Machine learning: Trends, perspectives, and prospects. Science 349,
6245 (2015), 255–260.
[87] Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista
Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings et al. 2021. Advances and open problems in feder-
ated learning. Found. Trends Mach. Learn. 14, 1–2 (2021), 1–210.

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:33

[88] Harsh Kasyap, Arpan Manna, and Somanath Tripathy. 2022. An efficient blockchain assisted reputation aware
decentralized federated learning framework. IEEE Trans. Netw. Serv. Manag. 20, 3 (2022), 2771–2782. DOI:https:
//[Link]/10.1109/TNSM.2022.3231283
[89] Harsh Kasyap and Somanath Tripathy. 2024. Privacy-preserving and Byzantine-robust federated learning framework
using permissioned blockchain. Expert Syst. Applic. 238 (2024), 122210.
[90] Harsh Kasyap and Somanath Tripathy. 2024. SINE: Similarity is not enough for mitigating local model poisoning
attacks in federated learning. IEEE Trans. Depend. Secure Comput. 21, 5 (2024), 4481–4494.
[91] Harsh Kasyap, Somanath Tripathy, and Mauro Conti. 2023. HDFL: Private and robust federated learning using hy-
perdimensional computing. In IEEE 22nd International Conference on Trust, Security and Privacy in Computing and
Communications (TrustCom’23). IEEE, 214–221.
[92] I. Kevin, Kai Wang, Xiaokang Zhou, Wei Liang, Zheng Yan, and Jinhua She. 2021. Federated transfer learning based
cross-domain prediction for smart manufacturing. IEEE Trans. Industr. Inform. 18, 6 (2021), 4088–4096.
[93] Latif U. Khan, Walid Saad, Zhu Han, Ekram Hossain, and Choong Seon Hong. 2021. Federated learning for internet
of things: Recent advances, taxonomy, and open challenges. IEEE Commun. Surv. Tutor. 23, 3 (2021), 1759–1799.
[94] Momin Ahmad Khan, Virat Shejwalkar, Amir Houmansadr, and Fatima M. Anwar. 2023. On the pitfalls of security
evaluation of robust federated learning. In IEEE Security and Privacy Workshops (SPW’23). IEEE, 57–68.
[95] Sajjad Khan, Jorao Gomes Jr, Muhammad Habib ur Rehman, and Davor Svetinovic. 2023. Dynamic behavior assess-
ment protocol for secure decentralized federated learning. Internet Things 24 (2023), 100956.
[96] Sangmook Kim, Wonyoung Shin, Soohyuk Jang, Hwanjun Song, and Se-Young Yun. 2022. FedRN: Exploiting k-
Reliable neighbors towards robust federated learning. In 31st ACM International Conference on Information & Knowl-
edge Management (CIKM’22). Association for Computing Machinery, New York, NY, USA, 972–981. DOI:https:
//[Link]/10.1145/3511808.3557322
[97] Xenia Konti, Hans Riess, Manos Giannopoulos, Yi Shen, Michael J. Pencina, Nicoleta J. Economou-Zavlanos, and
Michael M. Zavlanos. 2024. Distributionally robust clustered federated learning: A case study in healthcare. arXiv
preprint arXiv:2410.07039 (2024).
[98] Thanasis Kotsiopoulos, Panagiotis Sarigiannidis, Dimosthenis Ioannidis, and Dimitrios Tzovaras. 2021. Machine
learning and deep learning in smart manufacturing: The smart grid paradigm. Comput. Sci. Rev. 40 (2021), 100341.
[99] Prabhleen Kukreja and V. Mahendran. 2024. PraaKrum: A practical Byzantine-resilient federated learning algorithm.
In 16th International Conference on COMmunication Systems & NETworkS (COMSNETS’24). IEEE, 936–944.
[100] Long Tan Le, Tung-Anh Nguyen, Tuan-Dung Nguyen, Nguyen H. Tran, Nguyen Binh Truong, Phuong L. Vo,
Bui Thanh Hung, and Tuan Anh Le. 2024. Distributionally robust federated learning for mobile edge networks. Mob.
Netw. Applic. 29, 1 (2024), 1–11.
[101] Suchul Lee. 2024. Adaptive selection of loss function for federated learning clients under adversarial attacks. IEEE
Access 12 (2024), 96051–96062.
[102] Younghan Lee, Yungi Cho, Woorim Han, Ho Bae, and Yunheung Paek. 2023. FLGuard: Byzantine-robust federated
learning via ensemble of contrastive models. In European Symposium on Research in Computer Security. Springer,
65–84.
[103] Youngjoon Lee, Sangwoo Park, and Joonhyuk Kang. 2023. Byzantine-resilient federated learning via reverse aggre-
gation. In IEEE Virtual Conference on Communications (VCC’23). IEEE, 98–102.
[104] Cody Lewis, Vijay Varadharajan, and Nasimul Noman. 2023. Attacks against federated learning defense systems and
their mitigation. J. Mach. Learn. Res. 24, 30 (2023), 1–50.
[105] Anran Li, Yue Cao, Jiabao Guo, Hongyi Peng, Qing Guo, and Han Yu. 2023. FedCSS: Joint client-and-sample selection
for hard sample-aware noise-robust federated learning. Proc. ACM Manag. Data 1, 3 (2023), 1–24.
[106] Beibei Li, Peiran Wang, Zerui Shao, Ao Liu, Yukun Jiang, and Yizhou Li. 2023. Defending Byzantine attacks in en-
semble federated learning: A reputation-based phishing approach. Fut. Gen. Comput. Syst. 147 (2023), 136–148.
[107] Henger Li, Xiaolin Sun, and Zizhan Zheng. 2022. Learning to attack federated learning: A model-based reinforcement
learning attack framework. Advan. Neural Inf. Process. Syst. 35 (2022), 35007–35020.
[108] Jiani Li, Feiyang Cai, and Xenofon Koutsoukos. 2022. Byzantine resilient aggregation in distributed reinforcement
learning. In Distributed Computing and Artificial Intelligence, Volume 1: 18th International Conference, Kenji Matsui,
Sigeru Omatu, Tan Yigitcanlar, and Sara Rodríguez González (Eds.). Springer International Publishing, Cham, 56–66.
[109] Jiachun Li, Yan Meng, Lichuan Ma, Suguo Du, Haojin Zhu, Qingqi Pei, and Xuemin Shen. 2021. A federated learning
based privacy-preserving smart healthcare system. IEEE Trans. Industr. Inform. 18, 3 (2021), 2021–2031.
[110] Junyi Li, Jian Pei, and Heng Huang. 2022. Communication-efficient robust federated learning with noisy labels. In
28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’22). Association for Computing Ma-
chinery, New York, NY, USA, 914–924. DOI:[Link]
[111] Jing Li, Youliang Tian, Zhou Zhou, Axin Xiang, Shuai Wang, and Jinbo Xiong. 2024. PSFL: Ensuring data privacy and
model security for federated learning. IEEE Internet Things J. 11, 5 (2024), 26234–26252.

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:34 Md P. Uddin et al.

[112] Jiaming Li, Xinyue Zhang, and Liang Zhao. 2022. Robust federated learning based on metrics learning and unsuper-
vised clustering for malicious data detection. In ACM Southeast Conference (ACM SE’22). Association for Computing
Machinery, New York, NY, USA, 238–242. DOI:[Link]
[113] Li Li, Yuxi Fan, Mike Tse, and Kuo-Yi Lin. 2020. A review of applications in federated learning. Comput. Industr. Eng.
149 (2020), 106854. DOI:[Link]
[114] Qun Li, Congying Duan, and Siguang Chen. 2023. Robust federated learning with parameter classification and
weighted aggregation against noisy labels. In IEEE Global Communications Conference (GLOBECOM’23). IEEE,
2445–2450.
[115] Qi Li, Zhuotao Liu, Qi Li, and Ke Xu. 2023. martFL: Enabling utility-driven data marketplace with a robust and
verifiable federated learning architecture. In ACM SIGSAC Conference on Computer and Communications Security.
1496–1510.
[116] Shenghui Li, Edith Ngai, and Thiemo Voigt. 2021. Byzantine-robust aggregation in federated learning empowered
industrial IoT. IEEE Trans. Industr. Inform. 19, 2 (2021), 1165–1175.
[117] Shenghui Li, Edith Ngai, and Thiemo Voigt. 2023. Byzantine-robust aggregation in federated learning empowered
industrial IoT. IEEE Trans. Industr. Inform. 19, 2 (2023), 1165–1175. DOI:[Link]
[118] Shenghui Li, Edith Ngai, Fanghua Ye, and Thiemo Voigt. 2022. Auto-weighted robust federated learning with cor-
rupted data sources. ACM Trans. Intell. Syst. Technol. 13, 5, Article 73 (June 2022), 20 pages. DOI:[Link]
1145/3517821
[119] Shenghui Li, Edith C.-H. Ngai, and Thiemo Voigt. 2023. An experimental study of Byzantine-robust aggregation
schemes in federated learning. IEEE Trans. Big Data 10, 6 (2023), 1–13. DOI:[Link]
3237397
[120] Tian Li, Shengyuan Hu, Ahmad Beirami, and Virginia Smith. 2021. Ditto: Fair and robust federated learning through
personalization. In 38th International Conference on Machine Learning (Proceedings of Machine Learning Research),
Marina Meila and Tong Zhang (Eds.), Vol. 139. PMLR, 6357–6368. Retrievedfrom:[Link]
[Link]
[121] Wei Li, Yuanbo Chai, Fazlullah Khan, Syed Rooh Ullah Jan, Sahil Verma, Varun G. Menon, Kavita, and Xingwang
Li. 2021. A comprehensive survey on machine learning-based big data analytics for IoT-enabled smart healthcare
system. Mob. Netw. Applic. 26 (2021), 234–252.
[122] Wenjie Li, Kai Fan, Kan Yang, Yintang Yang, and Hui Li. 2023. PBFL: Privacy-preserving and Byzantine-robust fed-
erated learning empowered industry 4.0. IEEE Internet Things J. 11, 4 (2023), 7128–7140.
[123] Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, and Zhihua Zhang. 2019. On the convergence of FedAvg on
non-IID data. arXiv preprint arXiv:1907.02189 (2019).
[124] Xiumin Li, Mi Wen, Siying He, Rongxing Lu, and Liangliang Wang. 2023. A scheme for robust federated learning
with privacy-preserving based on Krum AGR. In IEEE/CIC International Conference on Communications in China
(ICCC’23). IEEE, 1–6.
[125] Yanli Li, Weiping Ding, Huaming Chen, Wei Bao, and Dong Yuan. 2024. Contribution-wise Byzantine-robust aggre-
gation for class-balanced federated learning. Inf. Sci. 667 (2024), 120475.
[126] Yanli Li, Abubakar Sadiq Sani, Dong Yuan, and Wei Bao. 2022. Enhancing federated learning robustness through
clustering non-IID features. In Asian Conference on Computer Vision. 41–55.
[127] Yijing Li, Xiaofeng Tao, Xuefei Zhang, Junjie Liu, and Jin Xu. 2021. Privacy-preserved federated learning for au-
tonomous driving. IEEE Trans. Intell. Transp. Syst. 23, 7 (2021), 8423–8434.
[128] Yanbin Li, Xuemei Wang, Runkang Sun, Xiaojun Xie, Shijia Ying, and Shougang Ren. 2023. Trustiness-based hierar-
chical decentralized federated learning. Knowl.-based Syst. 276 (2023), 110763.
[129] Yanli Li, Dong Yuan, Abubakar Sadiq Sani, and Wei Bao. 2023. Enhancing federated learning robustness in adversarial
environment through clustering non-IID features. Comput. Secur. 132 (2023), 103319.
[130] Zonghang Li, Yihong He, Hongfang Yu, Jiawen Kang, Xiaoping Li, Zenglin Xu, and Dusit Niyato. 2022. Data
heterogeneity-robust federated learning via group client selection in industrial IoT. IEEE Internet Things J. 9, 18
(2022), 17844–17857. DOI:[Link]
[131] Zhuohang Li, Luyang Liu, Jiaxin Zhang, and Jian Liu. 2021. Byzantine-robust federated learning through spatial-
temporal analysis of local model updates. In IEEE 27th International Conference on Parallel and Distributed Systems
(ICPADS’21). 372–379. DOI:[Link]
[132] Xiang-Yu Liang, Heng-Ru Zhang, Wei Tang, and Fan Min. 2024. Robust federated learning with voting and scaling.
Fut. Gen. Comput. Syst. 153 (2024), 113–124.
[133] Wei Yang Bryan Lim, Nguyen Cong Luong, Dinh Thai Hoang, Yutao Jiao, Ying-Chang Liang, Qiang Yang, Dusit Niy-
ato, and Chunyan Miao. 2020. Federated learning in mobile edge networks: A comprehensive survey. IEEE Commun.
Surv. Tutor. 22, 3 (2020), 2031–2063.

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:35

[134] Xiaoci Lin, Yanbin Li, Xiaojun Xie, Yu Ding, Xuehui Wu, and Chunpeng Ge. 2024. SF-CABD: Secure Byzantine fault
tolerance federated learning on Non-IID data. Knowl.-based Syst. 296 (2024), 111851.
[135] Ying Lin, Shengfu Ning, Jianpeng Hu, Jiansong Liu, Yifan Cao, Junyuan Zhang, and Huan Pi. 2022. PPBR-FL: A
privacy-preserving and Byzantine-robust federated learning system. In Knowledge Science, Engineering and Manage-
ment, Gerard Memmi, Baijian Yang, Linghe Kong, Tianwei Zhang, and Meikang Qiu (Eds.). Springer International
Publishing, Cham, 39–50.
[136] Chang Liu, Shaoyong Guo, Song Guo, Yong Yan, Xuesong Qiu, and Suxiang Zhang. 2021. LTSM: Lightweight and
trusted sharing mechanism of IoT data in smart city. IEEE Internet Things J. 9, 7 (2021), 5080–5093.
[137] Shuo Liu, Nirupam Gupta, and Nitin H. Vaidya. 2021. Approximate Byzantine fault-tolerance in distributed optimiza-
tion. In ACM Symposium on Principles of Distributed Computing (PODC’21). Association for Computing Machinery,
New York, NY, USA, 379–389. DOI:[Link]
[138] Shukan Liu, Zhenyu Li, Qiao Sun, Lin Chen, Xianfeng Zhang, and Li Duan. 2023. FLOW: A robust federated learning
framework to defend against model poisoning attacks in IoTs. IEEE Internet Things J. 11, 9 (2023), 15075–15086.
[139] Shangdong Liu, Xi Xu, Musen Wang, Fei Wu, Yimu Ji, Chenxi Zhu, and Qurui Zhang. 2023. FLGQM: Robust federated
learning based on geometric and qualitative metrics. Appl. Sci. 14, 1 (2023), 351.
[140] Tian Liu, Xueyang Hu, and Tao Shu. 2022. Assisting backdoor federated learning with whole population knowledge
alignment in mobile edge computing. In 19th Annual IEEE International Conference on Sensing, Communication, and
Networking (SECON’22). 416–424. DOI:[Link]
[141] Xun Liu, Jiangming Li, Siyang Chen, Xiaofeng Jiang, Feng Yang, and Jian Yang. 2024. Privacy-preservation robust fed-
erated learning with blockchain-based hierarchical framework. In International Conference on Computing, Machine
Learning and Data Science. 1–6.
[142] Xiaosong Liu, Chungen Xu, and Pan Zhang. 2023. Byzantine-robust federated learning based on multi-center secure
clustering. In International Conference on Electronics, Computers and Communication Technology. 203–207.
[143] Yang Liu, Yan Kang, Chaoping Xing, Tianjian Chen, and Qiang Yang. 2020. A secure federated transfer learning
framework. IEEE Intell. Syst. 35, 4 (2020), 70–82.
[144] Yi Liu, Ruihui Zhao, Jiawen Kang, Abdulsalam Yassine, Dusit Niyato, and Jialiang Peng. 2021. Towards
communication-efficient and attack-resistant federated edge learning for industrial internet of things. ACM Trans.
Internet Technol. 22, 3, Article 59 (Dec. 2021), 22 pages. DOI:[Link]
[145] Shiwei Lu, Ruihu Li, Xuan Chen, and Yuena Ma. 2022. Defense against local model poisoning attacks to Byzantine-
robust federated learning. Front. Comput. Sci. 16, 6 (2022), 166337.
[146] Shiwei Lu, Ruihu Li, and Wenbin Liu. 2024. FedDAA: A robust federated learning framework to protect privacy and
defend against adversarial attack. Front. Comput. Sci. 18, 2 (2024), 182307.
[147] Yanyang Lu and Lei Fan. 2020. An efficient and robust aggregation algorithm for learning federated CNN. In 3rd
International Conference on Signal Processing and Machine Learning (SPML’20). Association for Computing Machinery,
New York, NY, USA, 1–7. DOI:[Link]
[148] Yunlong Lu, Xiaohong Huang, Yueyue Dai, Sabita Maharjan, and Yan Zhang. 2020. Differentially private asynchro-
nous federated learning for mobile edge computing in urban informatics. IEEE Trans. Industr. Inform. 16, 3 (2020),
2134–2143. DOI:[Link]
[149] Shijie Luan, Xiang Lu, Zhuangzhuang Zhang, Guangsheng Chang, and Yunchuan Guo. 2023. Efficient and privacy-
preserving Byzantine-robust federated learning. In IEEE Global Communications Conference (GLOBECOM’23). IEEE,
2202–2208.
[150] Zhirong Luan, Wenrui Li, Meiqin Liu, and Badong Chen. 2024. Robust federated learning: Maximum correntropy
aggregation against Byzantine attacks. IEEE Trans. Neural Netw. Learn. Syst. 36, 1 (2024), 62–75.
[151] Yayu Luo, Tongzhijun Zhu, Zediao Liu, Tenglong Mao, Ziyi Chen, Huan Pi, and Ying Lin. 2024. GANFAT: Robust
federated adversarial learning with label distribution skew. Fut. Gen. Comput. Syst. 160 (2024), 711–723.
[152] Lingjuan Lyu, Han Yu, Xingjun Ma, Chen Chen, Lichao Sun, Jun Zhao, Qiang Yang, and Philip S. Yu. 2022. Privacy
and robustness in federated learning: Attacks and defenses. IEEE Trans. Neural Netw. Learn. Syst. 35, 7 (2022), 1–21.
DOI:[Link]
[153] Yu H. Zhao J. Lyu, L. and Q. Yang. 2020. Threats to federated learning. In Federated Learning (Lecture Notes in Com-
puter Science, Vol. 12500). Springer, 480–501.
[154] Hua Ma, Qun Li, Yifeng Zheng, Zhi Zhang, Xiaoning Liu, Yansong Gao, Said F. Al-Sarawi, and Derek Abbott. 2023.
MUD-PQFed: Towards malicious user detection on model corruption in privacy-preserving quantized federated
learning. Comput. Secur. 133 (2023), 103406.
[155] Jing Ma, Si-Ahmed Naas, Stephan Sigg, and Xixiang Lyu. 2022. Privacy-preserving federated learning based on multi-
key homomorphic encryption. Int. J. Intell. Syst. 37, 9 (2022), 5880–5901.
[156] Xu Ma, Xiaoqian Sun, Yuduo Wu, Zheli Liu, Xiaofeng Chen, and Changyu Dong. 2022. Differentially private
Byzantine-robust federated learning. IEEE Trans. Parallel Distrib. Syst. 33, 12 (2022), 3690–3701. DOI:[Link]
10.1109/TPDS.2022.3167434

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:36 Md P. Uddin et al.

[157] Xu Ma, Yuqing Zhou, Laihua Wang, and Meixia Miao. 2022. Privacy-preserving Byzantine-robust federated learning.
Comput. Stand. Interf. 80 (2022), 103561. DOI:[Link]
[158] Xiaodong Ma, Jia Zhu, Zhihao Lin, Shanxuan Chen, and Yangjie Qin. 2022. A state-of-the-art survey on solving
non-IID data in federated learning. Fut. Gen. Comput. Syst. 135 (2022), 244–258.
[159] Sanaullah Manzoor and Adnan Noor Mian. 2021. Robust federated learning-based content caching over uncertain
wireless transmission channels in FRANs. In 19th International Symposium on Modeling and Optimization in Mobile,
Ad hoc, and Wireless Networks (WiOpt’21). 1–6. DOI:[Link]
[160] Yunlong Mao, Xinyu Yuan, Xinyang Zhao, and Sheng Zhong. 2021. Romoa: Robust model aggregation for the resis-
tance of federated learning to model poisoning attacks. In Computer Security – ESORICS 2021, Elisa Bertino, Haya
Shulman, and Michael Waidner (Eds.). Springer International Publishing, Cham, 476–496.
[161] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-
efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics. PMLR, 1273–1282.
[162] Yuxi Mi, Yutong Mu, Shuigeng Zhou, and Jihong Guan. 2021. FedMDR: Federated model distillation with robust
aggregation. In Web and Big Data, Leong Hou U., Marc Spaniol, Yasushi Sakurai, and Junying Chen (Eds.). Springer
International Publishing, Cham, 18–32.
[163] Yinbin Miao, Ziteng Liu, Hongwei Li, Kim-Kwang Raymond Choo, and Robert H. Deng. 2022. Privacy-preserving
Byzantine-robust federated learning via blockchain systems. IEEE Trans. Inf. Forens. Secur. 17 (2022), 2848–2861.
DOI:[Link]
[164] Sparsh Mittal and Shraiysh Vaishay. 2019. A survey of techniques for optimizing deep learning on GPUs. J. Syst.
Archit. 99 (2019), 101635.
[165] Mohammad Moshawrab, Mehdi Adda, Abdenour Bouzouane, Hussein Ibrahim, and Ali Raad. 2023. PolyFLAG_SVM:
A polymorphic federated learning aggregation of gradients support vector machines framework. Proced. Comput. Sci.
224 (2023), 139–146.
[166] M. A. Moyeen, Kuljeet Kaur, Anjali Agarwal, Ricardo Manzano, Marzia Zaman, and Nishith Goel. 2023. FedChal-
lenger: Challenge-response-based defence for federated learning against Byzantine attacks. In IEEE Global Commu-
nications Conference (GLOBECOM’23). IEEE, 3843–3848.
[167] Xutong Mu, Ke Cheng, Yulong Shen, Xiaoxiao Li, Zhao Chang, Tao Zhang, and Xindi Ma. 2024. FedDMC: Efficient
and robust federated learning via detecting malicious clients. IEEE Trans. Depend. Secure Comput. 21, 6 (2024), 5259–
5274.
[168] Seyedsina Nabavirazavi, Rahim Taheri, and Sundararaja Sitharama Iyengar. 2024. Enhancing federated learning ro-
bustness through randomization and mixture. Fut. Gen. Comput. Syst. 158 (2024), 28–43.
[169] Yeshwanth Nagaraj and Ujjwal Gupta. 2024. Gradient guard: Robust federated learning using saliency maps. In 3rd
International Conference for Innovation in Technology (INOCON’24). IEEE, 1–6.
[170] Theodora Nevrataki, Anastasia Iliadou, George Ntolkeras, Ioannis Sfakianakis, Lazaros Lazaridis, George Maraslidis,
Nikolaos Asimopoulos, and George F. Fragulis. 2023. A survey on federated learning applications in healthcare,
finance, and data privacy/data security. In AIP Conference Proceedings, Vol. 2909. AIP Publishing.
[171] Dinh C. Nguyen, Ming Ding, Pubudu N. Pathirana, Aruna Seneviratne, Jun Li, and H. Vincent Poor. 2021. Federated
learning for internet of things: A comprehensive survey. IEEE Commun. Surv. Tutor. 23, 3 (2021), 1622–1658.
[172] Truong X. Nguyen, An Ran Ran, Xiaoyan Hu, Dawei Yang, Meirui Jiang, Qi Dou, and Carol Y. Cheung. 2022. Federated
learning in ocular imaging: Current progress and future direction. Diagnostics 12, 11 (2022), 2835.
[173] Lina Ni, Xu Gong, Jufeng Li, Yuncan Tang, Zhuang Luan, and Jinquan Zhang. 2024. rFedFW: Secure and trustable
aggregation scheme for Byzantine-robust federated learning in internet of things. Inf. Sci. 653 (2024), 119784.
[174] Florian Nuding and Rudolf Mayer. 2022. Data poisoning in sequential and parallel federated learning. In ACM on
International Workshop on Security and Privacy Analytics (IWSPA’22). Association for Computing Machinery, New
York, NY, USA, 24–34. DOI:[Link]
[175] Olusola T. Odeyomi, Bassey Ude, and Kaushik Roy. 2023. Online decentralized multi-agents meta-learning with
Byzantine resiliency. IEEE Access 11 (2023), 68286–68300.
[176] Olusola T. Odeyomi and Gergely Zaruba. 2023. Byzantine-resilient federated learning with differential privacy using
online mirror descent. In International Conference on Computing, Networking and Communications (ICNC’23). IEEE,
66–70.
[177] Babatomiwa Omonayajo, Fadi Al-Turjman, and Nadire Cavus. 2022. Interactive and innovative technologies for smart
education. Comput. Sci. Inf. Syst. 19, 3 (2022), 1549–1564.
[178] Sharnil Pandya, Gautam Srivastava, Rutvij Jhaveri, M. Rajasekhara Babu, Sweta Bhattacharya, Praveen Kumar Reddy
Maddikunta, Spyridon Mastorakis, Md Jalil Piran, and Thippa Reddy Gadekallu. 2023. Federated learning for smart
cities: A comprehensive survey. Sustain. Ener. Technol. Assess. 55 (2023), 102987.
[179] Monalisa Panigrahi, Sourabh Bharti, and Arun Sharma. 2023. A reputation-aware hierarchical aggregation frame-
work for federated learning. Comput. Electric. Eng. 111 (2023), 108900.

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:37

[180] Alexandros Pantelopoulos and Nikolaos G. Bourbakis. 2009. A survey on wearable sensor-based systems for health
monitoring and prognosis. IEEE Trans. Syst., Man, Cybern., Part C (Applic. Rev.) 40, 1 (2009), 1–12.
[181] Jungwuk Park, Dong-Jun Han, Minseok Choi, and Jaekyun Moon. 2021. Sageflow: Robust federated learning against
both stragglers and adversaries. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer,
Y. Dauphin, P. S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 840–851. Retrievedfrom:
[Link]
[182] Jihong Park, Sumudu Samarakoon, Anis Elgabli, Joongheon Kim, Mehdi Bennis, Seong-Lyun Kim, and Mérouane
Debbah. 2021. Communication-efficient and distributed learning over wireless networks: Principles and applications.
Proc. IEEE 109, 5 (2021), 796–819.
[183] Papa Pene, Weixian Liao, and Wei Yu. 2023. Incentive design for heterogeneous client selection: A robust federated
learning approach. IEEE Internet Things J. 11, 4 (2023), 5939–5950.
[184] Jie Peng, Zhaoxian Wu, Qing Ling, and Tianyi Chen. 2022. Byzantine-robust variance-reduced federated learning
over distributed non-I.I.D. data. Inf. Sci. 616 (2022), 367–391. DOI:[Link]
[185] Kilian Pfeiffer, Martin Rapp, Ramin Khalili, and Jörg Henkel. 2023. Federated learning for computationally-
constrained heterogeneous devices: A survey. Comput. Surv. 55, 14s (2023).
[186] Bjarne Pfitzner, Nico Steckhan, and Bert Arnrich. 2021. Federated learning in a medical context: A systematic litera-
ture review. ACM Trans. Internet Technol. 21, 2 (2021), 1–31.
[187] Krishna Pillutla, Sham M. Kakade, and Zaid Harchaoui. 2022. Robust aggregation for federated learning. IEEE Trans.
Sig. Process. 70 (2022), 1142–1154. DOI:[Link]
[188] Adnan Qayyum, Muhammad Umar Janjua, and Junaid Qadir. 2022. Making federated learning robust to adversarial
attacks by learning data and model association. Comput. Secur. 121 (2022), 102827. DOI:[Link]
2022.102827
[189] Tao Qi, Huili Wang, and Yongfeng Huang. 2024. Towards the robustness of differentially private federated learning.
In AAAI Conference on Artificial Intelligence, Vol. 38. 19911–19919.
[190] Yu Qiao, Apurba Adhikary, Chaoning Zhang, and Choong Seon Hong. 2024. Towards robust federated learning
via logits calibration on Non-IID data. In IEEE Network Operations and Management Symposium. 1–7. DOI:https:
//[Link]/10.1109/NOMS59830.2024.10575333
[191] Han Qin, Guimin Chen, Yuanhe Tian, and Yan Song. 2021. Improving federated learning for aspect-based sentiment
analysis via topic memories. In Conference on Empirical Methods in Natural Language Processing. 3942–3954.
[192] Youyang Qu, Md Palash Uddin, Chenquan Gan, Yong Xiang, Longxiang Gao, and John Yearwood. 2022. Blockchain-
enabled federated learning: A survey. ACM Comput. Surv. 55, 4, Article 70 (Nov. 2022), 35 pages. DOI:[Link]
10.1145/3524104
[193] Pengrui Quan, Wei-Han Lee, Mudhakar Srivatsa, and Mani Srivastava. 2022. Enhancing robustness in federated
learning by supervised anomaly detection. In 26th International Conference on Pattern Recognition (ICPR’22). 996–
1003. DOI:[Link]
[194] Rahul Rai, Manoj Kumar Tiwari, Dmitry Ivanov, and Alexandre Dolgui. 2021. Machine learning in manufacturing
and industry 4.0 applications. International Journal of Production Research 59, 16 (2021), 4773–4778. [Link]
10.1080/00207543.2021.1956675
[195] Swaroop Ramaswamy, Rajiv Mathews, Kanishka Rao, and Françoise Beaufays. 2019. Federated learning for emoji
prediction in a mobile keyboard. arXiv preprint arXiv:1906.04329 (2019).
[196] Priyesh Ranjan, Ashish Gupta, Federico Coro, and Sajal K. Das. 2023. Robust federated learning against backdoor
attackers. In IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS’23). IEEE, 1–6.
[197] Chandrasekar Ravi, Anmol Tigga, G. Thippa Reddy, Saqib Hakak, and Mamoun Alazab. 2022. Driver identification
using optimized deep learning model in smart transportation. ACM Trans. Internet Technol. 22, 4 (2022), 1–17.
[198] Amirhossein Reisizadeh, Farzan Farnia, Ramtin Pedarsani, and Ali Jadbabaie. 2020. Robust federated learning: The
case of affine distribution shifts. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato,
R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 21554–21565. Retrievedfrom:https://
[Link]/paper_files/paper/2020/file/[Link]
[199] Qingqing Ren, Shuyong Zhu, Lu Lu, Zhiqiang Li, Guangyu Zhao, and Yujun Zhang. 2023. NetShield: An in-network
architecture against Byzantine failures in distributed deep learning. Comput. Netw. 237 (2023), 110081.
[200] Lindon Roberts and Edward Smyth. 2022. A simplified convergence theory for Byzantine resilient stochastic gradient
descent. EURO J. Computat. Optim. 10 (2022), 100038. DOI:[Link]
[201] Nuria Rodríguez-Barroso, Eugenio Martínez-Cámara, M. Victoria Luzón, and Francisco Herrera. 2022. Backdoor
attacks-resilient aggregation based on robust filtering of oOutliers in federated learning for image classification.
Knowl.-based Syst. 245 (2022), 108588. DOI:[Link]
[202] Nuria Rodríguez-Barroso, Eugenio Martínez-Cámara, M. Victoria Luzón, and Francisco Herrera. 2022. Dynamic de-
fense against Byzantine poisoning attacks in federated learning. Fut. Gen. Comput. Syst. 133 (2022), 1–9. DOI:https:
//[Link]/10.1016/[Link].2022.03.003

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:38 Md P. Uddin et al.

[203] Mary Roszel, Robert Norvill, and Radu State. 2022. An analysis of Byzantine-tolerant aggregation mechanisms on
model poisoning in federated learning. In Modeling Decisions for Artificial Intelligence, Vicenç Torra and Yasuo
Narukawa (Eds.). Springer International Publishing, Cham, 143–155.
[204] Alireza Sadeghi, Gang Wang, and Georgios B. Giannakis. 2024. Learning robust to distributional uncertainties and
adversarial data. IEEE J. Select. Areas Inf. Theor. 5 (2024), 105–122.
[205] Sanchita Saha, Ashlesha Hota, Arup Kumar Chattopadhyay, Amitava Nag, and Sukumar Nandi. 2024. A multifaceted
survey on privacy preservation of federated learning: Progress, challenges, and opportunities. Artif. Intell. Rev. 57, 7
(2024), 184.
[206] Pedro Miguel Sánchez Sánchez, Alberto Huertas Celdrán, José Rafael Buendía Rubio, Gérôme Bovet, and Gregorio
Martínez Pérez. 2023. Robust federated learning for execution time-based device model identification under label-
flipping attack. Clust. Comput. 27, 1 (2023), 1–12.
[207] Suhail M. Shah, Liqun Su, and Vincent K. N. Lau. 2023. Robust federated learning over noisy fading channels. IEEE
Internet Things J. 10, 9 (2023), 7993–8013. DOI:[Link]
[208] Atul Sharma, Wei Chen, Joshua Zhao, Qiang Qiu, Saurabh Bagchi, and Somali Chaterji. 2023. FLAIR: Defense against
model poisoning attack in federated learning. In ACM Asia Conference on Computer and Communications Security.
553–566.
[209] Virat Shejwalkar, Amir Houmansadr, Peter Kairouz, and Daniel Ramage. 2022. Back to the drawing board: A critical
evaluation of poisoning attacks on production federated learning. In IEEE Symposium on Security and Privacy (SP’22).
IEEE, 1354–1371.
[210] Karthik Shenoy, Manohara M. M. Pai, and Radhika M. Pai. 2023. Siamese network based anomaly detection frame-
work for robust federated learning. In 3rd International Conference on Intelligent Technologies (CONIT’23). IEEE, 1–6.
[211] Siping Shi, Yingya Guo, Dan Wang, Yifei Zhu, and Zhu Han. 2023. Distributionally robust federated learning for
network traffic classification with noisy labels. IEEE Trans. Mob. Comput. 23, 5 (2023), 6212–6226.
[212] Siping Shi, Chuang Hu, Dan Wang, Yifei Zhu, and Zhu Han. 2022. Distributionally robust federated learning for
differentially private data. In IEEE 42nd International Conference on Distributed Computing Systems (ICDCS’22). 842–
852. DOI:[Link]
[213] Yong Shi, Yuanying Zhang, Yang Xiao, and Lingfeng Niu. 2022. Optimization strategies for client drift in federated
learning: A review. Proced. Comput. Sci. 214 (2022), 1168–1173. DOI:[Link]
[214] Zhaosen Shi, Xuyang Ding, Fagen Li, Yingni Chen, and Canran Li. 2023. Mitigation of a poisoning attack in federated
learning by using historical distance detection. Ann. Telecommun. 78, 3-4 (2023), 135–147.
[215] Ajay Shrestha and Ausif Mahmood. 2019. Review of deep learning algorithms and architectures. IEEE Access 7 (2019),
53040–53065.
[216] Houssem Sifaou and Geoffrey Ye Li. 2022. Robust federated learning via over-the-air computation. In IEEE 32nd Inter-
national Workshop on Machine Learning for Signal Processing (MLSP’22). 1–6. DOI:[Link]
2022.9943401
[217] Sushil Kumar Singh, Jeonghun Cha, Tae Woo Kim, and Jong Hyuk Park. 2021. Machine learning based distributed
big data analysis framework for next generation web in IoT. Comput. Sci. Inf. Syst. 18, 2 (2021), 597–618.
[218] Yile Song, Jiaqi Wang, and Yiming Liu. 2023. Robust federated learning for image semantic transmission under
Byzantine attacks. In IEEE/CIC International Conference on Communications in China (ICCC Workshops’23). IEEE,
1–6.
[219] Qiheng Sun, Xiang Li, Jiayao Zhang, Li Xiong, Weiran Liu, Jinfei Liu, Zhan Qin, and Kui Ren. 2023. ShapleyFL:
Robust federated learning based on Shapley value. In 29th ACM SIGKDD Conference on Knowledge Discovery and
Data Mining. 2096–2108.
[220] Ziteng Sun, Peter Kairouz, Ananda Theertha Suresh, and H. Brendan McMahan. 2019. Can you really backdoor
federated learning? arXiv:[Link]/1911.07963
[221] E. Tahanian, M. Amouei, H. Fateh, and M. Rezvani. 2021. A game-theoretic approach for robust federated learning.
Int. J. Eng. 34, 4 (2021), 832–842.
[222] Rahim Taheri, Mohammad Shojafar, Mamoun Alazab, and Rahim Tafazolli. 2021. Fed-IIoT: A robust federated
malware detection architecture in industrial IoT. IEEE Trans. Industr. Inform. 17, 12 (2021), 8442–8452. DOI:https:
//[Link]/10.1109/TII.2020.3043458
[223] Farnaz Tahmasebian, Jian Lou, and Li Xiong. 2022. RobustFed: A truth inference approach for robust federated
learning. In 31st ACM International Conference on Information & Knowledge Management (CIKM’22). Association
for Computing Machinery, New York, NY, USA, 1868–1877. DOI:[Link]
[224] Youming Tao, Sijia Cui, Wenlu Xu, Haofei Yin, Dongxiao Yu, Weifa Liang, and Xiuzhen Cheng. 2023. Byzantine-
resilient federated learning at edge. IEEE Trans. Comput. 72, 9 (2023), 2600–2614.
[225] Thin Tharaphe Thein, Yoshiaki Shiraishi, and Masakatu Morii. 2024. Personalized federated learning-based intrusion
detection system: Poisoning attack and defense. Fut. Gen. Comput. Syst. 153 (2024), 182–192.

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:39

[226] Youliang Tian, Shuai Wang, Jinbo Xiong, Renwan Bi, Zhou Zhou, and Md Zakirul Alam Bhuiyan. 2023. Robust
and privacy-preserving decentralized deep federated learning training: Focusing on digital healthcare applications.
IEEE/ACM Trans. Computat. Biol. Bioinform. 21, 4 (2023), 890–901.
[227] Vale Tolpegin, Stacey Truex, Mehmet Emre Gursoy, and Ling Liu. 2020. Data poisoning attacks against federated
learning systems. In Computer Security–ESORICS 2020: 25th European Symposium on Research in Computer Security,
ESORICS 2020, Guildford, UK, September 14–18, 2020, Proceedings, Part I 25. Springer, 480–501.
[228] Pradyumna Kumar Tripathy, Anurag Shrivastava, Varsha Agarwal, Devangkumar Umakant Shah, and S. V. Akilan-
deeswari. 2022. Federated learning algorithm based on matrix mapping for data privacy over edge computing. Int. J.
Pervas. Comput. Commun. 20, 5 (2022), 633–647.
[229] Vu Tuan Truong, Duc N. M. Hoang, and Long Bao Le. 2023. BFLMeta: Blockchain-empowered metaverse with
Byzantine-robust federated learning. In IEEE Global Communications Conference (GLOBECOM’23). IEEE, 5537–5542.
[230] Md Palash Uddin, Yong Xiang, Borui Cai, Xuequan Lu, John Yearwood, and Longxiang Gao. 2023. ARFL: Adaptive
and robust federated learning. IEEE Trans. Mob. Comput. 23, 5 (2023), 5401–5417.
[231] Md Palash Uddin, Yong Xiang, John Yearwood, and Longxiang Gao. 2022. Robust federated averaging via outlier
pruning. IEEE Sig. Process. Lett. 29 (2022), 409–413. DOI:[Link]
[232] Wei Wan, Shengshan Hu, Minghui Li, Jianrong Lu, Longling Zhang, Leo Yu Zhang, and Hai Jin. 2023. A four-pronged
defense against Byzantine attacks in federated learning. In 31st ACM International Conference on Multimedia. 7394–
7402.
[233] Che Wang, Zhenhao Wu, Jianbo Gao, Jiashuo Zhang, Junjie Xia, Feng Gao, Zhi Guan, and Zhong Chen. 2024. FedTop:
A constraint-loosed federated learning aggregation method against poisoning attack. Front. Comput. Sci. 18, 5 (2024),
185348.
[234] Hongyi Wang, Kartik Sreenivasan, Shashank Rajput, Harit Vishwakarma, Saurabh Agarwal, Jy-yong Sohn, Kang-
wook Lee, and Dimitris Papailiopoulos. 2020. Attack of the tails: Yes, you really can backdoor federated learning.
In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and
H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 16070–16084. Retrievedfrom:[Link]
files/paper/2020/file/[Link]
[235] Hanyu Wang, Liming Wang, and Hongjia Li. 2023. Byzantine-robust federated learning through dynamic clustering.
In IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom’23).
IEEE, 222–230.
[236] Mengxin Wang, Liming Fang, and Kuiqi Chen. 2023. FedNAT: Byzantine-robust federated learning through
activation-based attention transfer. In IEEE International Conference on Data Mining Workshops (ICDMW’23). IEEE,
1005–1012.
[237] Ning Wang, Yang Xiao, Yimin Chen, Yang Hu, Wenjing Lou, and Y. Thomas Hou. 2022. FLARE: Defending federated
learning against model poisoning attacks via latent space representations. In ACM on Asia Conference on Computer
and Communications Security (ASIA CCS’22). Association for Computing Machinery, New York, NY, USA, 946–958.
DOI:[Link]
[238] Naiyu Wang, Wenti Yang, Xiaodong Wang, Longfei Wu, Zhitao Guan, Xiaojiang Du, and Mohsen Guizani. 2022. A
blockchain based privacy-preserving federated learning scheme for internet of vehicles. Digit. Commun. Netw. 10, 1
(2022), 126–134. DOI:[Link]
[239] Silv Wang, Kai Fan, Kuan Zhang, Hui Li, and Yintang Yang. 2022. Data complexity-based batch sanitization method
against poison in distributed learning. Digit. Commun. Netw. 10, 2 (2022), 416–428. DOI:[Link]
2022.12.001
[240] Tao Wang, Bo Zhao, and Liming Fang. 2023. FLForest: Byzantine-robust federated learning through isolated forest.
In IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS’23). IEEE, 296–303.
[241] Yanmeng Wang, Yanqing Xu, Qingjiang Shi, and Tsung-Hui Chang. 2021. Robust federated learning in wireless
channels with transmission outage and quantization errors. In IEEE 22nd International Workshop on Signal Processing
Advances in Wireless Communications (SPAWC’21). 586–590. DOI:[Link]
[242] Yongkang Wang, Di-Hua Zhai, and Yuanqing Xia. 2023. SCFL: Mitigating backdoor attacks in federated learning
based on SVD and clustering. Comput. Secur. 133 (2023), 103414.
[243] Zhonglin Wang and Ping Zhao. 2024. Byzantine detection for federated learning under highly non-IID data and
majority corruptions. Wirel. Netw. 31, 1 (2024), 1–13.
[244] Ming Wei, Xiaofan Liu, and Wei Ren. 2022. RoFL: A robust federated learning scheme against malicious attacks. In
Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and
Big Data. Springer, 277–291.
[245] Jie Wen, Zhixia Zhang, Yang Lan, Zhihua Cui, Jianghui Cai, and Wensheng Zhang. 2023. A survey on federated
learning: Challenges and applications. Int. J. Mach. Learn. Cybern. 14, 2 (2023), 513–535.

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:40 Md P. Uddin et al.

[246] Di Wu, Nai Wang, Jiale Zhang, Yuan Zhang, Yong Xiang, and Longxiang Gao. 2022. A blockchain-based multi-
layer decentralized framework for robust federated learning. In International Joint Conference on Neural Networks
(IJCNN’22). 1–8. DOI:[Link]
[247] Ning Wu, Xiaoming Lin, Jianbin Lu, Fan Zhang, Weidong Chen, Jianlin Tang, and Jing Xiao. 2024. Byzantine-robust
multimodal federated learning framework for intelligent connected vehicle. Electronics 13, 18 (2024), 3635.
[248] Nannan Wu, Li Yu, Xuefeng Jiang, Kwang-Ting Cheng, and Zengqiang Yan. 2023. FedNoRo: Towards noise-robust
federated learning by addressing class imbalance and label noise heterogeneity. In 32nd International Joint Conference
on Artificial Intelligence. 4424–4432.
[249] Ruolan Wu, Yuling Chen, Chaoyue Tan, and Yun Luo. 2023. MDIFL: Robust federated learning based on malicious
detection and incentives. Appl. Sci. 13, 5 (2023), 2793.
[250] Tianyu Wu, Shizhu He, Jingping Liu, Siqi Sun, Kang Liu, Qing-Long Han, and Yang Tang. 2023. A brief overview
of ChatGPT: The history, status quo and potential future development. IEEE/CAA J. Autom. Sinic. 10, 5 (2023),
1122–1136.
[251] Peng Xi, Wenjuan Tang, Kun Xie, Xuan Liu, Peng Zhao, and Shaoliang Peng. 2023. RobustHealthFL: Robust strategy
against malicious clients in non-iid healthcare federated learning. In IEEE International Conference on Bioinformatics
and Biomedicine (BIBM’23). IEEE, 1545–1552.
[252] Feng Xia and Wenhao Cheng. 2024. A survey on privacy-preserving federated learning against poisoning attacks.
Clust. Comput. 27, 10 (2024), 1–18.
[253] Qi Xia, Zeyi Tao, and Qun Li. 2021. ToFi: An algorithm to defend against Byzantine attacks in federated learning. In
Security and Privacy in Communication Networks, Joaquin Garcia-Alfaro, Shujun Li, Radha Poovendran, Hervé Debar,
and Moti Yung (Eds.). Springer International Publishing, Cham, 229–248.
[254] Zihang Xiang, Tianhao Wang, Wanyu Lin, and Di Wang. 2023. Practical differentially private and Byzantine-resilient
federated learning. Proc. ACM Manag. Data 1, 2 (2023), 1–26.
[255] Chulin Xie, Minghao Chen, Pin-Yu Chen, and Bo Li. 2021. CRFL: Certifiably robust federated learning against
backdoor attacks. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Ma-
chine Learning Research), Marina Meila and Tong Zhang (Eds.), Vol. 139. PMLR, 11372–11382. Retrievedfrom:https:
//[Link]/v139/[Link]
[256] Ming Xie, Jie Ma, Guodong Long, and Chengqi Zhang. 2022. Robust clustered federated learning with bootstrap
median-of-means. In Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International
Conference on Web and Big Data. Springer, 237–250.
[257] Chang Xu, Yu Jia, Liehuang Zhu, Chuan Zhang, Guoxie Jin, and Kashif Sharif. 2022. TDFL: Truth discovery based
Byzantine robust federated learning. IEEE Trans. Parallel Distrib. Syst. 33, 12 (2022), 4835–4848. DOI:[Link]
10.1109/TPDS.2022.3205714
[258] Jie Xu, Benjamin S. Glicksberg, Chang Su, Peter Walker, Jiang Bian, and Fei Wang. 2021. Federated learning for
healthcare informatics. J. Healthc. Inform. Res. 5 (2021), 1–19.
[259] Jian Xu, Shao-Lun Huang, Linqi Song, and Tian Lan. 2022. Byzantine-robust federated learning through collabora-
tive malicious gradient filtering. In IEEE 42nd International Conference on Distributed Computing Systems (ICDCS’22).
1223–1235. DOI:[Link]
[260] Jing Xu, Stefanos Koffas, and Stjepan Picek. 2024. Unveiling the threat: Investigating distributed and centralized
backdoor attacks in federated graph neural networks. Digit. Threats: Res. Pract. 5, 2 (2024), 1–29.
[261] Mengfan Xu and Xinghua Li. 2022. FedBC: An efficient and privacy-preserving federated consensus scheme. In
Security and Privacy in Social Networks and Big Data, Xiaofeng Chen, Xinyi Huang, and Mirosław Kutyłowski (Eds.).
Springer Nature Singapore, Singapore, 148–162.
[262] Shuo Xu, Hui Xia, Rui Zhang, Peishun Liu, and Yu Fu. 2024. FedNor: A robust training framework for federated
learning based on normal aggregation. Inf. Sci. 684 (2024), 121274.
[263] Xun Xu, Yuqian Lu, Birgit Vogel-Heuser, and Lihui Wang. 2021. Industry 4.0 and Industry 5.0—Inception, conception
and perception. J. Manuf. Syst. 61 (2021), 530–535.
[264] Yichang Xu, Ming Yin, Minghong Fang, and Neil Zhenqiang Gong. 2024. Robust federated learning mitigates client-
side training data distribution inference attacks. In ACM on Web Conference. 798–801.
[265] Chengang Yang, Danyang Xiao, Bokai Cao, and Weigang Wu. 2022. RTGA: Robust ternary gradients aggregation for
federated learning. Inf. Sci. 616 (2022), 427–443. DOI:[Link]
[266] Han Yang, Dongbing Gu, and Jianhua He. 2023. DeMAC: Towards detecting model poisoning attacks in federated
learning system. Internet Things 23 (2023), 100875.
[267] Mingkun Yang, Ran Zhu, Qing Wang, and Jie Yang. 2024. FedTrans: Client-transparent utility estimation for robust
federated learning. In 12th International Conference on Learning Representations.
[268] Seunghan Yang, Hyoungseob Park, Junyoung Byun, and Changick Kim. 2022. Robust federated learning with noisy
labels. IEEE Intell. Syst. 37, 2 (2022), 35–43. DOI:[Link]

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:41

[269] Yi-Rui Yang and Wu-Jun Li. 2023. Buffered asynchronous SGD for Byzantine learning. J. Mach. Learn. Res. 24, 204
(2023), 1–62.
[270] Zheng Yang, Ke Gu, and Yiming Zuo. 2024. Byzantine robust federated learning scheme based on backdoor triggers.
Comput., Mater. Contin. 79, 2 (2024), 2813–2831.
[271] Jingyi Yao, Chen Song, Hongjia Li, Yuxiang Wang, Qian Yang, and Liming Wang. 2024. An enclave-aided Byzantine-
robust federated aggregation framework. In IEEE Wireless Communications and Networking Conference (WCNC’24).
IEEE, 1–6.
[272] Wenbin Yao, Bangli Pan, Yingying Hou, Xiaoyong Li, and Yamei Xia. 2023. An adaptive model filtering algorithm
based on Grubbs test in federated learning. Entropy 25, 5 (2023), 715.
[273] Liping Yi, Xiaorong Shi, Wenrui Wang, Gang Wang, and Xiaoguang Liu. 2023. FedRRA: Reputation-aware robust
federated learning against poisoning attacks. In International Joint Conference on Neural Networks (IJCNN’23). IEEE,
1–8.
[274] Dong Yin, Yudong Chen, Ramchandran Kannan, and Peter Bartlett. 2018. Byzantine-robust distributed learning: To-
wards optimal statistical rates. In 35th International Conference on Machine Learning (Proceedings of Machine Learning
Research), Jennifer Dy and Andreas Krause (Eds.), Vol. 80. PMLR, 5650–5659. Retrievedfrom:[Link]
press/v80/[Link]
[275] Xuefei Yin, Yanming Zhu, and Jiankun Hu. 2021. A comprehensive survey of privacy-preserving federated learning:
A taxonomy, review, and future directions. ACM Comput. Surv. 54, 6, Article 131 (July 2021), 36 pages. DOI:https:
//[Link]/10.1145/3460427
[276] Bin Yu, Wenjie Mao, Yihan Lv, Chen Zhang, and Yu Xie. 2022. A survey on federated learning in data mining. Wiley
Interdiscip. Rev.: Data Min. Knowl. Discov. 12, 1 (2022), e1443.
[277] Lei Yu and Lingfei Wu. 2020. Towards Byzantine-resilient Federated Learning via Group-wise Robust Aggregation.
Springer International Publishing, Cham, 81–92. DOI:[Link]
[278] Pengqian Yu, Achintya Kundu, Laura Wynter, and Shiau Hong Lim. 2022. Personalized, Robust Federated Learning
with Fed+. Springer International Publishing, Cham, 99–123. DOI:[Link]
[279] Syed Zawad, Ahsan Ali, Pin-Yu Chen, Ali Anwar, Yi Zhou, Nathalie Baracaldo, Yuan Tian, and Feng Yan. 2021. Curse
or redemption? How data heterogeneity affects the robustness of federated learning. In AAAI Conference on Artificial
Intelligence, Vol. 35. 10807–10814.
[280] Honghong Zeng, Jie Li, Jiong Lou, Shijing Yuan, Chentao Wu, Wei Zhao, Sijin Wu, and Zhiwen Wang. 2024. BSR-FL:
An efficient Byzantine-robust privacy-preserving federated learning framework. IEEE Trans. Comput. 73, 8 (2024),
2096–2110.
[281] Kun Zhai, Qiang Ren, Junli Wang, and Chungang Yan. 2021. Byzantine-robust federated learning via credibility
assessment on non-IID data. CoRR abs/2109.02396 (2021).
[282] Chen Zhang, Yu Xie, Hang Bai, Bin Yu, Weihong Li, and Yuan Gao. 2021. A survey on federated learning. Knowl.-based
Syst. 216 (2021), 106775.
[283] Chang Zhang, Shunkun Yang, Lingfeng Mao, and Huansheng Ning. 2024. Anomaly detection and defense techniques
in federated learning: A comprehensive review. Artif. Intell. Rev. 57, 6 (2024), 1–34.
[284] Heyi Zhang, Jun Wu, Qianqian Pan, Ali Kashif Bashir, and Marwan Omar. 2024. Toward Byzantine-robust distributed
learning for sentiment classification on social media platform. IEEE Trans. Computat. Soc. Syst. (2024), 1–11.
[285] Jiale Zhang, Bing Chen, Xiang Cheng, Huynh Thi Thanh Binh, and Shui Yu. 2020. PoisonGAN: Generative poisoning
attacks against federated learning in edge computing systems. IEEE Internet Things J. 8, 5 (2020), 3310–3322.
[286] Jiale Zhang, Chunpeng Ge, Feng Hu, and Bing Chen. 2022. RobustFL: Robust federated learning against poisoning
attacks in industrial IoT systems. IEEE Trans. Industr. Inform. 18, 9 (2022), 6388–6397. DOI:[Link]
2021.3132954
[287] Jie Zhang, Bo Li, Chen Chen, Lingjuan Lyu, Shuang Wu, Shouhong Ding, and Chao Wu. 2023. Delving into the
adversarial robustness of federated learning. In AAAI Conference on Artificial Intelligence, Vol. 37. 11245–11253.
[288] Jinghui Zhang, Dingyang Lv, Qiangsheng Dai, Fa Xin, and Fang Dong. 2023. Noise-aware local model training mech-
anism for federated learning. ACM Trans. Intell. Syst. Technol. 14, 4 (2023), 1–22.
[289] Kaiyue Zhang, Xuan Song, Chenhan Zhang, and Shui Yu. 2022. Challenges and future directions of secure federated
learning: A survey. Front. Comput. Sci. 16 (2022), 1–8.
[290] Lefeng Zhang, Tianqing Zhu, Ping Xiong, Wanlei Zhou, and Philip S. Yu. 2023. A robust game-theoretical federated
learning framework with joint differential privacy. IEEE Trans. Knowl. Data Eng. 35, 4 (2023), 3333–3346. DOI:https:
//[Link]/10.1109/TKDE.2021.3140131
[291] Xudong Zhang and Lei Wang. 2024. High-dimensional M-estimation for Byzantine-robust decentralized learning.
Inf. Sci. 653 (2024), 119808.
[292] Yanjun Zhang, Guangdong Bai, Mahawaga Arachchige Pathum Chamikara, Mengyao Ma, Liyue Shen, Jingwei Wang,
Surya Nepal, Minhui Xue, Long Wang, and Joseph Liu. 2023. AgrEvader: Poisoning membership inference against
Byzantine-robust federated learning. In ACM Web Conference. 2371–2382.

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:42 Md P. Uddin et al.

[293] Yanting Zhang, Jianwei Liu, Zhenyu Guan, Bihe Zhao, Xianglun Leng, and Song Bian. 2023. ARMOR: Differential
model distribution for adversarially robust federated learning. Electronics 12, 4 (2023), 842.
[294] Yifei Zhang, Dun Zeng, Jinglong Luo, Xinyu Fu, Guanzhong Chen, Zenglin Xu, and Irwin King. 2024. A survey of
trustworthy federated learning: Issues, solutions, and challenges. ACM Trans. Intell. Syst. Technol. 15, 6 (2024).
[295] Zaixi Zhang, Xiaoyu Cao, Jinyuan Jia, and Neil Zhenqiang Gong. 2022. FLDetector: Defending federated learn-
ing against model poisoning attacks via detecting malicious clients. In 28th ACM SIGKDD Conference on Knowl-
edge Discovery and Data Mining (KDD’22). Association for Computing Machinery, New York, NY, USA, 2545–2555.
DOI:[Link]
[296] Zhiyuan Zhang, Deli Chen, Hao Zhou, Fandong Meng, Jie Zhou, and Xu Sun. 2024. Fed-FA: Theoretically modeling
client data divergence for federated language backdoor defense. Advan. Neural Inf. Process. Syst. 36 (2024), 62006–
62031.
[297] Zikai Zhang and Rui Hu. 2023. Byzantine-robust federated learning with variance reduction and differential privacy.
In IEEE Conference on Communications and Network Security (CNS’23). IEEE, 1–9.
[298] Zhuangzhuang Zhang, W. L. Libing, Debiao He, Jianxin Li, Shuqin Cao, and Xianfeng Wu. 2023. Communication-
efficient and Byzantine-robust federated learning for mobile edge computing networks. IEEE Netw. 37, 4 (2023), 112–
119.
[299] Zhuangzhuang Zhang, Libing Wu, Debiao He, Jianxin Li, Na Lu, and Xuejiang Wei. 2024. Using third-party auditor
to help federated learning: An efficient Byzantine-robust federated learning. IEEE Trans. Sustain. Comput. 9, 6 (2024),
848–861.
[300] Zhao Zhang, Yong Zhang, Da Guo, Lei Yao, and Zhao Li. 2022. SecFedNIDS: Robust defense for poisoning attack
against federated learning-based network intrusion detection system. Fut. Gen. Comput. Syst. 134 (2022), 154–169.
DOI:[Link]
[301] Bo Zhao, Tao Wang, and Liming Fang. 2023. FedCom: Byzantine-robust federated learning using data commitment.
In IEEE International Conference on Communications (ICC’23). IEEE, 33–38.
[302] Chen Zhao, Yu Wen, Shuailou Li, Fucheng Liu, and Dan Meng. 2021. FederatedReverse: A detection and defense
method against backdoor attacks in federated learning. In ACM Workshop on Information Hiding and Multimedia
Security (IH&MMSec ’21). Association for Computing Machinery, New York, NY, USA, 51–62. DOI:[Link]
1145/3437880.3460403
[303] Lingchen Zhao, Jianlin Jiang, Bo Feng, Qian Wang, Chao Shen, and Qi Li. 2022. SEAR: Secure and efficient aggregation
for Byzantine-robust federated learning. IEEE Trans. Depend. Secure Comput. 19, 5 (2022), 3329–3342. DOI:https:
//[Link]/10.1109/TDSC.2021.3093711
[304] Ping Zhao, Jin Jiang, and Guanglin Zhang. 2024. FedSuper: A Byzantine-robust federated learning under supervision.
ACM Trans. Sensor Netw. 20, 2 (2024), 1–29.
[305] Puning Zhao, Fei Yu, and Zhiguo Wan. 2024. A Huber loss minimization approach to Byzantine robust federated
learning. In AAAI Conference on Artificial Intelligence, Vol. 38. 21806–21814.
[306] Shihai Zhao, Juncheng Pu, Xiaodong Fu, Li Liu, and Fei Dai. 2024. Byzantine-robust federated learning with ensemble
incentive mechanism. Fut. Gen. Comput. Syst. 159 (2024), 272–283.
[307] Yue Zhao, Meng Li, Liangzhen Lai, Naveen Suda, Damon Civin, and Vikas Chandra. 2018. Federated learning with
non-IID data. arXiv preprint arXiv:1806.00582 (2018).
[308] Xu Zheng, Qihao Dong, and Anmin Fu. 2022. WMDefense: Using watermark to defense Byzantine attacks in feder-
ated learning. In IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS’22). 1–6. DOI:https:
//[Link]/10.1109/INFOCOMWKSHPS54753.2022.9798217
[309] Fangtong Zhou, Ruozhou Yu, Zhouyu Li, Huayue Gu, and Xiaojian Wang. 2022. FedAegis: Edge-based Byzantine-
robust federated learning for heterogeneous data. In IEEE Global Communications Conference (GLOBECOM’22). 3005–
3010. DOI:[Link]
[310] Hongliang Zhou, Yifeng Zheng, Hejiao Huang, Jiangang Shu, and Xiaohua Jia. 2023. Toward robust hierarchical
federated learning in internet of vehicles. IEEE Trans. Intell. Transp. Syst. 24, 5 (2023), 5600–5614.
[311] Jingxuan Zhou, Zhengyi Zhong, Ji Wang, Xiongtao Zhang, Fuxu Chen, and Chun Yan. 2021. Robust federated learn-
ing with adaptable learning rate. In 7th International Conference on Big Data and Information Analytics (BigDIA’21).
485–490. DOI:[Link]
[312] Ting Zhou, Ning Liu, Bo Song, Hongtao Lv, Deke Guo, and Lei Liu. 2024. RobFL: Robust federated learning via
feature center separation and malicious center detection. In IEEE 40th International Conference on Data Engineering
(ICDE’24). IEEE, 926–938.
[313] Yao Zhou, Jun Wu, Haixun Wang, and Jingrui He. 2022. Adversarial robustness through bias variance decompo-
sition: A new perspective for federated learning. In 31st ACM International Conference on Information & Knowl-
edge Management (CIKM’22). Association for Computing Machinery, New York, NY, USA, 2753–2762. DOI:https:
//[Link]/10.1145/3511808.3557232

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:43

[314] Banghua Zhu, Lun Wang, Qi Pang, Shuai Wang, Jiantao Jiao, Dawn Song, and Michael I. Jordan. 2023. Byzantine-
robust federated learning with optimal statistical rates. In International Conference on Artificial Intelligence and Sta-
tistics. PMLR, 3151–3178.
[315] Heng Zhu and Qing Ling. 2022. Byzantine-robust aggregation with gradient difference compression and stochastic
variance reduction for federated learning. In IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP’22). 4278–4282. DOI:[Link]
[316] Hangyu Zhu, Jinjin Xu, Shiqing Liu, and Yaochu Jin. 2021. Federated learning on non-IID data: A survey. Neurocom-
puting 465 (2021), 371–390.
[317] Juncen Zhu, Jiannong Cao, Divya Saxena, Shan Jiang, and Houda Ferradi. 2023. Blockchain-empowered federated
learning: Challenges, solutions, and future directions. ACM Comput. Surv. 55, 11, Article 240 (Feb. 2023), 31 pages.
DOI:[Link]
[318] Lingzi Zhu, Bo Zhao, Weidong Li, Yixuan Wang, and Yang An. 2024. TICPS: A trustworthy collaborative intrusion
detection framework for industrial cyber–physical systems. Ad Hoc Netw. 160 (2024), 103517.
[319] Ran Zhu, Mingkun Yang, Jie Yang, and Qing Wang. 2023. FedNaWi: Selecting the befitting clients for robust federated
learning in IoT applications. In 20th Annual IEEE International Conference on Sensing, Communication, and Networking
(SECON’23). IEEE, 402–410.
[320] Tengteng Zhu, Zehua Guo, Chao Yao, Jiaxin Tan, Songshi Dou, Wenrun Wang, and Zhenzhen Han. 2024. Byzantine-
robust federated learning via cosine similarity aggregation. Comput. Netw. (2024), 110730.
[321] Wenting Zhu, Zhe Liu, Zongyi Chen, Chuan Shi, Xi Zhang, and Sanchuan Guo. 2023. FedValidate: A robust federated
learning framework based on client-side validation. In 8th International Conference on Data Science in Cyberspace
(DSC’23). IEEE, 337–344.
[322] Yizheng Zhu, Yuncheng Wu, Zhaojing Luo, Beng Chin Ooi, and Xiaokui Xiao. 2023. Secure and verifiable data col-
laboration with low-cost zero-knowledge proofs. arXiv preprint arXiv:2311.15310 (2023).
[323] Yuanshao Zhu, Shuyu Zhang, Yi Liu, Dusit Niyato, and James J. Q. Yu. 2020. Robust federated learning approach
for travel mode identification from non-IID GPS trajectories. In IEEE 26th International Conference on Parallel and
Distributed Systems (ICPADS’20). 585–592. DOI:[Link]
[324] Zhang H. Zhu, H. and Y. Jin. 2021. From federated learning to federated neural architecture search: A survey. In
Complex & Intelligent Systems. Springer, 639–657.

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:44 Md P. Uddin et al.

Appendix
A Consolidated Thematic Result Analysis
A.1 Client-side Treatment
A.1.1 Objective Regularization. A comprehensive comparative analysis of the objective
regularization-based approaches for ensuring the robustness of FL is provided in Table 5, which
highlights potential gaps in their effectiveness as well.

Table 5. Objective Regularization-based Treatments to Attain Robustness in FL


Considered Robustness Application Evaluation
Ref. Robustness Approach Dataset Potential Gap
Robustness Technique Domain Type
No exploration on the applicability of the
Robust to data
L2 regularization- Fashion MNIST, proposed method to other attacks such as
and model Uses the existing L2 regularization,
[120] based CV CelebA and Experimental backdoor attacks; Limited experiments
poisoning analyzed in the FedProx method
training FEMNIST performed in the classical FL context; No
attacks
experiments on non-IID data
Jacobian
Robust to Uses Jacobian regularization to increase CIFAR10, MNIST, No treatment on the models aggregation;
regularization-
[66] malicious model stability against model CV SVHN, and Experimental Limited experiments performed in the classical
based
updates perturbations Fashion-MNIST FL context; Limited experiments on non-IID data
training
Sampling-based
successive
Robust to noise Uses a regularization for the loss function Regularization is not made adaptive; Limited
convex Theoretical and
[8] from wireless approximation via sampling-based CV MNIST experiments performed in classical FL context;
approximation- Experimental
communications successive convex approximation No experiments on non-IID data
driven
regularization
Loss function Edge devices learn to assign the local
Robust to noisy correction using weights of loss functions in noisy labeled MNIST, and Theoretical and Requires a local clean test dataset in each client;
[27] CV
labeled data a re-weighted datasets and cooperate with the central CIFAR10 Experimental Limited experiments on non-IID data
mechanism server to update global weights
Improves local training by preventing MNIST,
Robust to noisy Local self- Requires augmented data; Limited experiments
memorization of noisy labels and Fashion-MNIST,
[82] labels (label regularization CV Experimental on non-IID data; Limited poisoning attacks
narrowing the gap between original and CIFAR10,
flipping attacks) method tested
augmented data using self-distillation Clothing1M
Synthetic,
Robust to RiseFL framework using probabilistic Real-world High computation and communication cost;
Secure Medical Experimental
[322] Byzantine L2-norm checks and hybrid commitment tabular data, Scalability challenges with large numbers of
aggregation Imaging and Theoretical
attacks schemes Medical image clients
datasets
Robust to The combination of adversarial training and
Adaptive loss function selection based on
adversarial Adaptive Byzantine-robust aggregation did not
whether a client is under attack, Intrusion CICMalDroid
[101] attacks and selection of loss Experimental complement each other as expected, leading to
combined with Byzantine-robust Detection 2020
Byzantine function potential inefficiencies in model training
aggregation
attacks stability and convergence
Robust to
Huber loss Minimizing multi-dimensional Huber loss Synthesized and Theoretical and Needs further analysis for extreme cases of
[305] Byzantine Not specified
minimization to aggregate gradients real data Experimental non-IID data and highly imbalanced data
attacks

A.1.2 Optimizer Modification. The optimizer modification-based approaches for ensuring FL


robustness are compared comprehensively in Table 6.
A.1.3 Differential Privacy Employment. We provide a comprehensive comparative unit analysis
indicating the potential gaps in the differential privacy employment-based approaches for ensuring
FL robustness in Table 7.
A.1.4 Additional Dataset Requirement and Decentralization Orchestration. We provide a com-
prehensive comparative analysis, outlining the potential gaps in the additional dataset requirement
and decentralization orchestration-based approaches for ensuring FL robustness in Table 8.
A.1.5 Manifold. The manifold methods for ensuring FL robustness are summarized in Tables 9
and 10.

A.2 Server-side Treatment


A.2.1 Client Selection. We provide a comprehensive comparative unit analysis indicating the
potential gaps of the client selection-based approaches for ensuring FL robustness in Tables 11–16.
A.2.2 Aggregation Approach Modification. We provide a comprehensive comparative unit anal-
ysis indicating the potential gaps of the aggregation approach modification-based approaches for
ensuring FL robustness in Tables 17–19.

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:45

Table 6. Optimizer Modification-based Treatments to Attain Robustness in FL


Considered Robustness Application Evaluation
Ref. Robustness Approach Dataset Potential Gap
Robustness Technique Domain Type
Modification of the
SGD optimizer for
Robust to Simplified convergence theory is used to No new robustness method proposed; No
[200] making it - - Theoretical
malicious clients develop the generic Byzantine-resilient SGD experimental validation
Byzantine-resilient
SGD
Distributed
Allowing all the non-faulty agents to compute No non-IID data consideration; Just model
Robust to gradient-descent
[137] the minimum point of only the non-faulty - - Theoretical poisoning attacks tested without data poisoning
malicious models with robust gradient
agents’ aggregate cost, instead of all the agents attacks; Limited attack-robust baselines
aggregation
SGD is modified by allowing all the workers to
Robust to SGD-based best Theoretical It performs better in strong Byzantine attack but
send their local updates at their maximum
[44] Byzantine attack effort voting to IoT MNIST and in the weakest Byzantine attack, sometimes it
transmit power to make it robust to the
or noisy clients Byzantine attack Experimental falls. No experiments on non-IID data
Byzantine attack
No considering Non-IID settings; Limited
Robust to Use of stochastic Reducing the compression noise with gradient Theoretical
Datasets (only one); Limited FL settings; Limited
[315] poisonous average gradient difference compression to improve the CV COVTYPE and
baselines; Limited attack settings (specifically
models algorithm Byzantine-robustness Experimental
fewer attack scenarios consideration)
Stochastic average SAGA is designed to update the local model to
gradient algorithm eliminate the outer variation caused by
Robust to MNIST, Theoretical Limited non-IID data settings; Unstable
(SAGA) for Byzantine clients via resample-then-aggregate
[184] Byzantine CV COVTYPE, and performance on non-IID data with diverse
client-side training strategy for selecting the variation-reduced
attacks CIFAR100 Experimental attacks
followed by gradient to aggregate with geometric median
resampling strategy aggregation method
Each consensus round involves all nodes
Gradient-similarity- transmitting the DO sub-gradients, which are Increases computation and communication costs,
Robust to MNIST and
[261] based secure carefully clustered and divided. The dynamic CV Experimental Limited experiments performed in the classical
Byzantine nodes EMNIST
consensus removal of Byzantine gradients is then carried FL context; No experiments on non-IID data
out in each round of the consensus procedure.
Robust to Affine A minimax distributed optimization method is Theoretical Limited affine distribution shifts tested; Limited
A minimax MNIST, and
[198] distribution formulated to guard against affine distribution CV and experiments performed in classical FL context;
optimization method CIFAR10
shifts shifts Experimental No experiments on non-IID data
Robust Ternary Ternary quantization compresses client
Exploring the relationship between quantization
Robust to Gradients gradients to 2 bits per vector point.
Theoretical methods and robust aggregation algorithms are
targeted and Aggregation Error-feedback gauges real-compressed gradient MNIST, and
[265] CV and missing; Limited experiments performed in
untargeted combining ternary variance. The server’s aggregation detects CIFAR10
Experimental classical FL context; Limited experiments on
attacks gradients malicious clients based on risk scores from
non-IID data
quantization divergence optimization.
Robust to
Buffered Use of buffers to store gradients before Potential delay in convergence due to buffer
Byzantine Experimental
asynchronous aggregation, adaptive reassignment of workers CIFAR-10, reassignment and asynchronous updates;
[269] attacks and CV and
stochastic gradient to buffers, local momentum for enhanced ImageNet requires fine-tuning of buffer size and
poisoning Theoretical
descent Byzantine resilience reassignment interval for optimal performance
attacks
Simulated
Robust to Polymorphic dataset,
Uses AES-256 with polymorphic keys for Limited to SVM, high computational cost due to
inference attacks encryption for Medical SHAREEDB,
[165] encrypting each message exchange, Gradient Experimental encryption, challenges in scalability with large
and poisoning securing data Imaging and Surgical
Descent for optimization client networks
attacks exchanges binary
dataset
Robust to model
poisoning Adaptive tuning of local epochs, step size MNIST, Experimental Performance may degrade in high attack
Adaptive model CV and Text
[230] attacks and adjustment, and robust aggregation by CIFAR-10, and scenarios; needs further exploration of efficiency
update Classification
Byzantine supplanting outlier updates Shakespeare Theoretical in large-scale models
attacks
Utilizes a Wasserstein distance-based certified Experimental The method introduces additional computational
Robust to noisy Intrusion ISCXVPN-
[211] Robust Optimization robust region to normalize noisy labeled data, and overhead due to the DRO process and the need
labels detection 2016
combining it with clean data for training Theoretical to estimate global distribution in each round
MNIST,
High computational complexity in large-scale
Wasserstein distance-based ambiguity set for Fashion Experimental
Robust to settings, need for careful tuning of
[204] Client selection DRO, Stochastic Proximal Gradient Descent CV MNIST, and
adversarial data hyperparameters, potential challenges in
(SPGD) algorithm CIFAR-10, Theoretical
non-IID data scenarios
CIFAR-100
Synthetic
Challenges in scalability due to the heavy
Robust to Byzantine-resilient Use of coordinate-wise trimmed mean datasets, and Experimental
Text computational and communication costs,
[224] Byzantine distributed gradient aggregation and gradient compression Boston and
Classification particularly when dealing with large-scale
attacks descent techniques to handle heavy-tailed data Housing Theoretical
networks and datasets
Price dataset

A.2.3 Aggregation Hyperparameter Tuning. We provide a comprehensive comparative unit anal-


ysis indicating the potential gaps of the aggregation hyperparameter tuning-based approaches to
ensure the robustness of FL in Tables 20–23.

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:46 Md P. Uddin et al.

Table 7. Differential Privacy Employment-based Treatments during Model Training to Attain Robustness
in FL
Considered Robustness Application Evaluation
Ref. Robustness Approach Dataset Potential Gap
Robustness Technique Domain Type
No consideration of model poisoning
Robust against Theoretical
Joint differential privacy-based FL for MNIST, attacks (just limited data poisoning
[290] poisoning Differential privacy CV and
limiting the influence of adversaries CIFAR10 attacks); No non-IID consideration; No
attacks Experimental
attack-robust baselines
Robust to Just robustness evaluation in FL with
Defenses contain weak differential privacy
backdoor attacks existing backdoor attacks and defense;
for model update attacks including
(specifically Evaluation on one dataset, EMNIST only;
[220] Differential privacy unconstrained and constrained attacks for CV EMNIST Experimental
targeted model Limited number of defense methods;
fixed-frequency attacks and random
update poisoning Limited backdoor attacks considered; No
sampling
attacks) Non-IID setting considered
Differential Privacy
It encrypts local updates with Local DP No consideration of non-IID data;
Robust to followed by random sub Reuters,
(LDP), uses random gossip updates to thwart Text Limited FL hyperparameters settings; No
[148] malicious gossip update and Newsgroups, Experimental
malice and selects low mean absolute error Classification comparison baselines; Just adapt to
updates incentive-weighted and Ohsumed
local models for global aggregation- asynchronous FL
aggregation
It utilizes a shuffle protocol for summation to
Robust to
realize the DP encryption for exchanging MNIST, Fash- Theoretical Limited Byzantine attacks considered;
Byzantine Differentially private
[156] model parameters and global aggregation CV ionMNIST, and Limited FL hyperparameters settings;
adversarial summation protocol
and uses existing Byzantine-robust stochastic and CIFAR10 Experimental Limited comparison baselines
attacks
aggregation to aggregate the global model
Robust to model
Differential privacy and VPPFL scheme with DP-based perturbation MNIST, Experimental, Overhead due to DP noise, scalability for
[79] poisoning CV
Gaussian noise and verification mechanism CIFAR-10 Theoretical large user groups
attacks
Robust to Fashion- Potential communication overhead and
SIREN+ framework with local DP and
[65] Byzantine DP-based training CV MNIST, Experimental computation burden due to proactive
proactive alarming based on model accuracy
attacks CIFAR-10 alarming and DP mechanisms
MNIST,
Robust to FedALC framework using adversarial Limited performance on extremely
Fashion-
[190] adversarial Adversarial training training (AT) with logits calibration for CV Experimental unbalanced datasets; potential overhead
MNIST,
attacks balancing class distributions due to adversarial training
CIFAR-10
Robust to model
MNIST, Theoretical Potential issues with the practical
poisoning Joint differential privacy to motivate honest
[290] Differential privacy CV CIFAR-10, and implementation of game-theoretic
attacks and participation and ensure robustness
LFW Experimental mechanisms
malicious clients
Robust to Potential challenges in scalability due to
The proposed algorithm combines Both
Byzantine the computational complexity of the
Differential Privacy with Online Mirror MNIST, Experimental
[176] attacks and DP-based training CV proposed algorithm, especially in
Descent to ensure privacy and convergence CIFAR-10 and
poisoning large-scale systems with many Byzantine
in the presence of Byzantine clients Theoretical
attacks clients and non-IID data scenarios
Robust to High variance in the aggregated model
Fashion- Both
Byzantine Integrating sparsification and update when dealing with large models;
CV and Text MNIST, Experimental
[297] attacks and DP-based training momentum-driven variance reduction with a potential for increased computational
Classification Shakespeare and
poisoning client-level DP mechanism complexity due to variance reduction
dataset Theoretical
attacks techniques

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:47

Table 8. Additional Dataset and Decentralization Orchestration-based Treatments to Attain Robustness


in FL
Considered Robustness Application Evaluation
Ref. Robustness Approach Dataset Potential Gap
Robustness Technique Domain Type
Only one toy dataset (MNIST) considered; No
Blockchained Blockchain is used to secure the privacy and
Robust to new aggregation method proposed; No non-IID
multi-layer data safety, provide resilience on poisoning
[246] poisoning IoT MNIST Experimental consideration; Limited FL hyperparameters; No
decentralized FL attacks to the central model, prevent the
attacks attack-robust baselines; Limited poisoning
framework central server failure or malfunction
attacks
Heterogeneous model communication
Robust to noisy Utilizing public data, employs a public dataset for feedback
Requires a global public dataset; Limited
and noise-tolerant loss alignment. Robust loss mitigates internal CIFAR10, and
[46] CV Experimental experiments performed in classical FL context;
heterogeneous function, and label noise. A novel client confidence CIFAR100
limited experiments on non-IID data
clients re-weighting scheme re-weighting handles challenging noisy
feedback in collaborative learning.
Reference
dataset-based Creates a reference dataset to examine the
Robust to Requires a local test dataset in each client,
algorithm along performance of the uploaded weights MNIST,
[253] Byzantine CV Experimental Limited poisoning attacks tested; Non-IID
with a practical computed by each node from their private CIFAR10
attacks dataset is not considered
Two-Filter algorithm dataset using training losses
(ToFi)
Robust to model Sent140,
FedAnil model: Combines clustering, The model may not fully resist advanced
poisoning Fashion-
Blockchain-enabled blockchain technology, and encryption label-flipping attacks and large-scale collude
[49] attacks and CV MNIST, Experimental
FL techniques to ensure robustness and attacks; potential for divergence in certain
malicious and FEMNIST,
privacy-preservation in FL non-IID settings
unreliable clients CIFAR-10
Robust to model
poisoning
CIFAR-10, Need for more comprehensive testing across
[16] attacks and Decentralized FL Uses blockchain and role rotation CV Experimental
PEMS-BAY different types of attacks.
Byzantine
failures
High computational cost for blockchain
Robust to Decentralized Byzantine-resilient consensus
Byzantine-resilient operations, potential issues with scalability in
[48] Byzantine using Smart Contracts and blockchain, CV MNIST Experimental
consensus protocol large networks, requires fine-tuning for different
attacks Differential Privacy for data protection
ML models.
Robust to
MNIST,
Byzantine Adaptive filtering Local Model Filter (LMF) with adaptive Scalability issues and computational overhead,
Fashion-
[243] attacks and with a shadow filtering policy to filter poisoned models CV Experimental especially when the ratio of malicious clients is
MNIST,
poisoning dataset based on performance on a shadow dataset high
CIFAR-10
attacks
MNIST,
Robust to Supervising local Using a Local Model Filter (LMF) with The approach requires a shadow dataset, and its
Fashion-
[304] Byzantine models with a accuracy and loss modes to filter out CV Experimental effectiveness is influenced by the non-IID level
MNIST,
attacks shadow dataset poisoned models of data.
CIFAR-10
Robust to model MNIST, High communication and computational
VPSA using Hyperledger Fabric to replace Experimental
poisoning Fashion- overhead due to blockchain integration,
[89] Blockchain-based FL central curator and secure prediction CV and
attacks and MNIST, scalability challenges, potential latency issues
mechanism Theoretical
inference attacks CIFAR-10 with larger networks.
Robust to model
poisoning High computational overhead, complexity in
attacks, Incorporates cross-validation to detect deployment due to the integration of multiple
[111] Blockchain-based FL Not specified Custom Experimental
Byzantine Byzantine attackers security mechanisms, potential scalability issues
attacks and in large networks.
inference attacks
Robust to model
High communication overhead due to ring
poisoning Decentralized A decentralized deep FL scheme using
MNIST, structure, complexity in handling edge dropouts,
[226] attacks and Ring-Allreduce, Ring-Allreduce-based data sharing combined CV and HAR Experimental
UCI-HAR scalability challenges in larger networks,
Byzantine Secret sharing with threshold secret sharing
especially with more complex datasets.
attacks
Robust to Public auditing MNIST,
FL-Auditor framework with sampling audits Potential overhead due to third-party auditing;
[299] Byzantine using a third-party CV FEMNIST, Experimental
and lazy update mechanism scalability challenges in larger systems
attacks auditor CIFAR-10
VSS-based privacy-preserving robust High communication and computational
Robust to Blockchain-based
gradient aggregation (VPRGA) integrated overhead due to secret sharing and blockchain
[141] Byzantine hierarchical CV MNIST Experimental
with blockchain to detect and eliminate integration, scalability concerns with larger
attacks aggregation
anomalous gradients networks
Robust to
Potential limitations when handling highly
Byzantine Iterative elimination of compromised models Google
Reverse aggregation Text non-IID data distributions; computational
[103] attacks and based on validation loss decrease using a Speech Experimental
mechanism Classification overhead in iterating the reverse aggregation
model poisoning globally shared dataset Commands
process
attacks
Robust to Cre-DPSIGN algorithm integrating unbiased Both Challenges in setting appropriate
Blockchain-based
Byzantine clients differential privacy mechanisms (CreFlip and Experimental hyperparameters such as privacy budget and
[39] Stochastic model CV MNIST
and poisoning CreGau) with blockchain for credit and learning rate, affecting accuracy and robustness
aggregation
attacks management and sampling Theoretical under different attack scenarios

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:48 Md P. Uddin et al.

Table 9. Manifold Hyperparameter Tuning during Model Training-based Treatments to Attain Robustness
in FL (Part 1)
Considered Robustness Application Evaluation
Ref. Robustness Approach Dataset Potential Gap
Robustness Technique Domain Type
MNIST,
Robust to Maximizing the bias and variance during Theoretical Limited non-IID cases; Rely on extra auxiliary
Client update Fashion-MNIST,
[313] adversarial model update and learning the adversarially CV and adversarial datasets; Increasing communication
mechanism CIFAR10,
attacks robust model updates during client update Experimental cost
CIFAR100
Introducing the CV, OCR, Text No backdoor-robust method proposed while
Detecting backdoor attack and the reason Theoretical
Robust to ways of edge-case prediction, CIFAR10, ARDIS, giving an analysis of the hardness of detecting
[234] behind it to solve the issues by tuning the and
backdoor attacks backdoor and how Sentiment and MNIST the edge-case backdoors and proposing a new
model in client-side Experimental
to resolve this issue analysis backdoor attack
To speed up training, reduce communication
costs, and improve robustness to network
faults, it conducts hyper-dimensional
Hyper-dimensional MNIST, No treatment on the models’ aggregation;
Robust to learning on features derived from a
[23] learning-based CV Fashion-MNIST Experimental Limited experiments performed in the classical
network errors self-supervised contrastive learning
training and CIFAR10 FL context; No experiments on non-IID data
framework instead of transmitting the model
and simply training the hyper-dimensional
component
The server pre-trains a global model with a
Data sharing public dataset. Clients in the FL framework Travel mode Requires a uniform dataset in the server for
Robust to
strategy and obtain pre-trained model parameters and identification Geolife GPS training, Limited experiments performed in the
[323] real-world Experimental
attention augmented uniform data. Clients select their private data from GPS trajectory classical FL context; No experiments on non-IID
malicious data
model subset for FL training based on received trajectories data
model parameters and uniform data.
Employs produced samples to evaluate the
A defense
Robust to activation of potential edge-case backdoors MNIST, Requires to generate augmented samples;
mechanism
[40] poisoning in the model and relies on generative CV Fashion-MNIST, Experimental Limited experiments performed in classical FL
leveraging
attacks adversarial networks to extract data and CIFAR10 context; No experiments on non-IID data
adversarial learning
characteristics from model changes
Teachers use reliable instances as examples
Robust to sensor Manual sanitary during teaching. The model, fine-tuned on
IJCNN, SUSY, Requires trusted instances; Limited experiments
noise and checks to verify the these, aims for accurate predictions on them.
[72] Regression CPUSMALL, and Experimental performed in classical FL context; Limited
environmental quality of data Despite data corruption, the method
ABALONE experiments on non-IID data
perturbations instances integrates teaching and learning, producing a
robust prediction model.
The method autonomously adjusts the The hard threshold regulates the model
Autonomously
Robust to learning rate based on the received accuracy; Limited experiments performed in
[311] adjusts the learning CV FashionMNIST Experimental
backdoor attack parameter signs from all participants to classical FL context; No experiments on non-IID
rate
reduce the impact of malicious updates data
Limited experiments performed in classical FL
A GAN detects fake federated weights,
Robust to noisy ML100K context; No experiments on non-IID data, No
GAN-driven finding problematic local model updates Content
[159] federated provided by Experimental experiment for channel noise variations and
technique before aggregation using min-max Caching
weights MovieLens estimation for noise tolerance levels for
optimization
federated weight averaging
An adversarial setup adds a server-side Requires a different test dataset to generate the
Robust to
predictive model for logit attribution. MNIST, extra logits; No experiments on non-IID data;
Byzantine Extra logits-based
[286] Simultaneously, the federated model is CV Fashion-MNIST, Experimental Unable to defend against collusion poisoning
(poisoning) predictive model
trained adversarially to counteract predictive and CIFAR10 attacks and meanwhile protecting local
attacks
behavior, reducing poisoning attack impact. gradients from leakage
Robust to data
Model training using Uses existing FedAvg approach to evaluate GTSRB and Yale Limited data poisoning attacks are considered;
[174] poisoning CV Experimental
the existing FedAvg the impacts of the attacks Faces Limited non-IID settings
attacks
Model training Focuses on quantifying and understanding CV and FEMNIST, Not considering non-IID settings; Limited
Robust to
[279] using the existing the implications brought by data sentiment Sent140, and Experimental considerations on aggregation strategies; No
backdoor attacks
approaches heterogeneity in FL backdoor attacks classification CIFAR10 new development for robust FL
Robust to MNIST, EMNIST,
Provides robustness guarantee from the CV and Theoretical No considering Non-IID settings; Limited
backdoor attacks Training using the and LOAN
[255] theoretical aspects and conducts the related financial data and considerations on aggregation strategies (only
with limited existing approaches (financial
verification from the experimental aspects classification Experimental one) and other improved FL training paradigms
magnitude datasets)

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:49

Table 10. Manifold Hyperparameter Tuning during Model Training-based Treatments to Attain
Robustness in FL (Part 2)
Considered Robustness Application Evaluation
Ref. Robustness Approach Dataset Potential Gap
Robustness Technique Domain Type
Robust to model Internal and
Both The study indicates potential biases in data
poisoning Combines FL with personalized parameter external CT
Personalized Medical Experimental collection and limitations due to retrospective
[47] attacks and learning strategy and multi-phase CT image datasets,
training Imaging and study design. It also highlights the need for
Byzantine analysis including 1187
Theoretical prospective validation.
attacks instances
Generative MNIST,
Robust to Combines GAN-based adversarial training High computational cost, potential tradeoff
Adversarial CIFAR-10,
[151] adversarial with class probability distribution to balance CV Experimental between robustness and natural accuracy,
Networks (GAN) for CIFAR-100,
attacks natural accuracy and robustness requires tuning for different datasets
adversarial training SVHN
Graph Neural
NCI1, Limited scalability to larger datasets and
Robust to Networks Injection of subgraph triggers in Federated
[260] Drug Discovery PROTEINS_full, Experimental settings; Ineffectiveness of existing defenses in
backdoor attacks (GNNs)-based GNNs
TRIANGLES mitigating backdoor attacks
training
Noise-aware label Noise-aware Local Model Training Increased computational overhead due to label
FashionMNIST,
Robust to noisy correction and Mechanism: Corrects noisy labels using a correction, reliance on cross-validation that may
[288] CV CIFAR-10, Experimental
labels cross-validation Label Correction Network (LCN) and not fully eliminate noise, scalability issues in
Clothing1M
sampling meta-learning large datasets
Robust to data Custom Potential computational overhead, scalability
Encrypted Verification key mechanism that encrypts
[85] poisoning Not specified synthetic Experimental challenges with large datasets, reliance on
verification scheme and verifies data integrity before training
attacks datasets encryption that may introduce complexity
Federated Adversarial Learning with Logits
Robust to MNIST, High communication cost in adversarial
Calibration (FedALC) using square root
[128] adversarial Adversarial training CV Fashion-MNIST, Experimental training; Future work includes broader testing
inverse class frequency for local logits
attacks CIFAR-10 with more models and datasets
adjustment
Robust to
Byzantine Both
Ensemble learning to train multiple global Current defenses struggle when a large
attacks and MNIST, Experimental
[306] Ensemble learning models and distance-based aggregation rule CV proportion of clients are malicious; economic
untargeted Fashion-MNIST and
to mitigate outlier updates incentives for clients are often overlooked
poisoning Theoretical
attacks
Synthesized
Robust to radar and
Uses adversarial examples in client training Spectrum Needs further exploration in real-world settings
[22] adversarial Adversarial training wireless Experimental
and aggregating local models sensing for broader generalization
attacks communication
signals
Robust to Byzan- MNIST, Requires suitable computational power for
Contrastive learning Ensemble of contrastive models trained with
[102] tine/poisoning CV CIFAR-10, Experimental training contrastive models, challenges in
and clustering dimension reduction and clustering methods
attacks FEMNIST non-IID settings
Robust to
adversarial
Adversarial training to reduce adversarial MNIST, Balancing model utility and robustness;
[293] attacks and Adversarial training CV Experimental
success rate and transferability CIFAR-10 potential accuracy deterioration
Byzantine
attacks
ARMOR framework, including ARgan for MNIST, High computational overhead, limited
Robust to
[40] Adversarial learning generating synthetic datasets and MORpheus CV FashionMNIST, Experimental evaluation on diverse real-world scenarios,
backdoor attacks
for attack mitigation CIFAR-10 potential issues with GAN training stability
Potential computational complexity, challenges
COVID
Robust to label Spiral self-paced Employs a spiral self-paced learning strategy Medical in scaling to larger and more diverse datasets,
[81] Radiography Experimental
noise learning to mitigate the impact of noisy labels Imaging dependency on correct parameter tuning for
dataset
self-paced learning
Robust to
privacy inference
Secure multiparty SAFEFL framework integrating MPC with High communication and runtime overhead;
[54] attacks and HAR HAR Experimental
computation robust aggregation techniques Limited testing on more complex architectures
poisoning
attacks

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:50 Md P. Uddin et al.

Table 11. Client Selection-based Treatments to Attain Robustness in FL (Part 1)


Considered Robustness Application Evaluation
Ref. Robustness Approach Dataset Potential Gap
Robustness Technique Domain Type
Staleness-aware Utilizes outcomes sent by slower components by
grouping with initially categorizing models from the same original
Robust to MNIST, Theoretical The server should have access to every
entropy-based models (with identical delay), acquiring a
[181] stragglers and CV Fashion-MNIST, and received individual model to calculate the
filtering and representative model per group. Then, aggregates
adversaries and CIFAR10 Experimental entropy and loss values
loss-weighted these representative models, considering their
averaging delays, to derive the global model.
Not considering other attack-robust
Client selection
Model filtering for potential client selection and use Internet of Theoretical aggregation baselines; Very limited Non-IID
Robust to clients’ followed by the
[238] of improved multi-Krum with homomorphic Vehicle using MNIST and settings; Limited FL hyperparameters tested;
malicious data modification of
encryption for model averaging CV dataset Experimental Limited attack cases); Just a simple
models averaging
combination for existing aggregation method
High computation cost on the server; Not
Clients’ contributions gauge via layer-wise network
Robust to considering non-IID case; Relatively poor
parameter distance to the server. Standard-range
[147] corrupted Client selection CV FEMNIST Experimental performance on non-CNN models; Limited
distances carry higher weight. Short distances imply
models FL hyperparameters tested; Limited corrupt
offline status, affecting global model convergence.
model cases
A new consensus
Robust to MNIST, Theoretical
mechanism named Model interpretation, contribution calculation, and Just considering data poisoning attacks
[88] adversarial CV Fashion-MNIST, and
PoIS to detect label-wise aggregation for effective client selection without model poisoning attacks
models and CIFAR10 Experimental
adversaries
Enhances clustering-based scheme robustness via
Robust to
automatic update clipping. Clipping occurs before Poor performance on non-IID data; Limited
[119] Byzantine- Client selection CV MNIST, CIFAR10 Experimental
clustering using clip by norm, adjusting vectors that improvement against Byzantine-attacks
attacks
surpass a preset value.
Addresses information leakage, lessens dilution Theoretical
Robust to faulty Proposed a backdoor attack instead of a
[140] Client selection from normal clients’ updates, and factors in whole CV MNIST and
and noisy clients robust FL
population distribution during client selection Experimental
Robust to model Robust filtering of FEMNIST, Limited experiments performed in the
Filtering out the malicious clients’ model updates
[201] poisoning one-dimensional CV CelebA-S, and Experimental classical FL context; No experiments on
via a univariate outlier detection technique
backdoor attack outliers CelebA-A non-IID data
Benign nodes are chosen based on Euclidean It is good in low computational overhead in
Robust to Filter useful Theoretical
distance from neighbors, utilizing their own MNIST, and any system, but in high computational
[67] Byzantine- updates, excluding CV and
baseline rather than neighbor mean/median for CIFAR10 overhead, it works normally. No experiments
attacks malicious clients Experimental
calculation on non-IID data
Robust to It uses two gradient filters (norm-based filtering and Limited Non-IID setting (strictly its so-called
Sign-gradient MNIST/Fashion- Theoretical
Byzantine model sign-based clustering) to select the trusted gradient Image and Text Non-IID setting in the paper is a variation of
[259] filtering to identify MNIST, CIFAR10, and
poisoning set; then aggregate the global gradient with existing Classification IID); poor performance in the Non-IID
malicious gradients and AG-News Experimental
attacks aggregation rules, like trimmed-mean, and so on settings
To counter FL data poisoning, a 3-stage method is No considering non-IID settings; Unrealistic
Robust to data Data-complexity- introduced: (1) Utilize data complexity features to training of poison data batch detection
UCI’s Spam
[239] poisoning based clients’ batch form a training dataset for binary classification. (2) Spam Detection Experimental model; Just consider one data poison case
dataset
attacks detection method Train a layered detection model. (3) Detect poison (label flipping); Just one dataset
data locally. consideration on horizontal FL
Uses gradient-based binary permutation to create
Just one dataset considered; Limited non-IID
Gradient-based super nodes within a base station. Implements Theoretical
Robust to settings; Potential failure as facing highly
[130] binary permutation training and group aggregation via one-step CV FEMNIST and
malicious client skewed non-IID settings within BSs;
client selection internal synchronization. Achieves group model Experimental
Particular FL applications
aggregation via multistep external synchronization
Calculates gradient variation vs. initial global model
No non-IID, only one attack considered;
Robust to data PCA-based for each local model. Applies PCA for 2D reduction.
CIFAR10, and Limited defense due to linear PCA’s difficulty
[227] poisoning malicious detection Identifies malicious gradients in reduced CV Experimental
FMNIST in capturing representative low-dimensional
attacks method dimensions, excluding them from global
local gradient features
aggregation.
This includes 4 stages: (1) Pre-aggregation: use
attack-specific aggregation for local models; (2) Limited model poisoning attacks are
Robust to local
Detection: employ Pearson correlation for Krum, considered; No non-IID consideration; Just
[145] model poisoning Client selection CV FEMNIST Experimental
and parameter removal count for trimmed-mean one dataset tested; Poor generalization for
attack
and median; (3) Removal: discard malicious models; the proposed defense method
(4) Re-aggregation: aggregate rest models.
This method employs the Induced Ordered
Require access to the raw data to design the
Robust to Weighted Averaging operator for client aggregation.
MNIST, FMNIST, client selection mechanism; Limited
[202] poisoning Client selection Weights are determined by an induced-ordered CV Experimental
and CIFAR10 experiments performed in the classical FL
attacks function and linguistic quantifier, considering local
context; No experiments on non-IID data
model effectiveness and predefined factors.

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:51

Table 12. Client Selection-based Treatments to Attain Robustness in FL (Part 2)


Considered Robustness Application Evaluation
Ref. Robustness Approach Dataset Potential Gap
Robustness Technique Domain Type
Reverse engineering The method consists of four steps: model correction, Increases computation and
Robust to model
and global reverse outlier identification, global reverse trigger creation, communication costs; Limited
replacement MNIST, and
[302] trigger-based and reverse engineering. Clients in FL reverse CV Experimental experiments performed in the classical FL
attacks and GTSRB
malicious clients engineer the global models they get before each context; Limited experiments on non-IID
backdoor attacks
detection training session. data
Requires access to the raw data to design
Data volume, computation power, and network
Robust to failure the client selection mechanism; Limited
[70] Client selection bandwidth have been used as the parameters to CV MNIST Experimental
of edge devices experiments performed in the classical FL
compute the optimal client set
context; No experiments on non-IID data
GAN-based
Requires access to the raw data to design
Robust to countermeasure The approach calculates anomalous values per Drebin,
the GAN-based clients detection; Limited
[222] poisoning algorithms to avoid client and excludes those above the threshold from IIoT Contagio, Experimental
experiments performed in the classical FL
attacks anomaly in aggregation and Gnome
context; No experiments on non-IID data
aggregation
No discussion on the expansion of
Based on previous model updates, the server vertical FL; Asynchronous FL in different
Detecting malicious
Robust to model forecasts a client’s model update for each iteration MNIST, domains including text and graphs, and
clients via checking
[295] poisoning and flags a client as malicious if the received model CV CIFAR10, and Experimental effective recovery of the global model
their model-updates
attacks update and the projected model update differ FEMNIST from model poisoning assaults after
consistency
repeatedly deleting the identified malicious clients;
Limited experiments on non-IID data
The server uses a clean root dataset to build a base
MNIST,
Trust model. Clients’ updates receive trust scores based Require a clean dataset in the server;
CV and Human CIFAR10, and Theoretical
Robust to Bootstrapping-based on alignment with the server’s update direction. Limited experiments performed in the
[17] Activity Human and
malicious clients malicious clients Normalizing local update magnitudes averaging classical FL context; Limited experiments
Recognition Activity Experimental
detection updates are weighted by trust scores for global on non-IID data
Recognition
model updates.
History of
The key idea is to employ long-short histories of No consideration of backdoor attacks;
Robust to gradients-based MNIST, and
gradients along with carefully selected distance and Limited experiments performed in the
[69] malicious and malicious and CV Fashion- Experimental
similarity metrics during the iterative model update classical FL context; Limited experiments
unreliable clients unreliable clients MNIST
process on non-IID data
detection
To provide privacy and filter out abnormal
Filtering out MNIST,
Robust to parameters from Byzantine attackers, the proposed Limited experiments performed in the
anomaly parameters EMNIST, and
[157] membership and method is built from an existing Byzantine-robust CV Experimental classical FL context; No experiments on
from Byzantine Fashion-
reverse attacks FL method and supplemented with distributed non-IID data
adversaries MNIST
Paillier encryption and zero-knowledge proof
Similarity measure
Robust to MNIST, and Theoretical Limited experiments performed in the
to detect malicious Cosine similarity is used to judge the malicious
[163] poisoning CV Fashion- and classical FL context; No experiments on
gradients of the gradients uploaded by malicious clients
attacks MNIST Experimental non-IID data
clients
In the presence of channel uncertainty, the transmit
Device selection, The hard threshold for tolerable
Robust to power and receive beamforming are determined by
transmit power Theoretical maximum aggregation regulates the
imperfect minimizing the maximum MSE. Devices with minor
[52] allocation, and CV MNIST and model accuracy, Limited experiments
channel state errors are chosen until the overall MSE is near to,
receive Experimental performed in the classical FL context; No
information but not greater than, the specified number, the
beamforming experiments on non-IID data
acceptable maximum aggregate error
Robust to noise A feasible training scheme is created using a
No consideration of non-IID data issue in
in the models Sampling-based sampling-based successive convex approximation Theoretical
the theoretical proof; Very limited
[7] during successive convex approach due to the non-convex challenges of the CV MNIST and
experiments performed in classical FL
communication approximation objective function and the unavailable maximum Experimental
context; No experiments on non-IID data
steps noise condition
Model-level It selects vital model parameters, offering a concise
Robust to Limited poisoning attacks tested; Limited
defensive representation of local model inputs. An online Network
Byzantine UNSW-NB15, experiments performed in classical FL
[300] mechanism based on unsupervised detection approach identifies and intrusion Experimental
(poisoning) CICIDS2018 context; Limited experiments on non-IID
poisoned model excludes poisoned models from joining the global detection
attacks data
detection intrusion detection model.
Robust truth MNIST, Limited experiments on non-IID data;
Robust to
discovery Assigns weights according to users’ contribution Fashion- Theoretical Unable to maintain high accuracy for
Byzantine
[257] aggregation scheme and uses maximum clique-based filter aggregation CV MNIST, and Byzantine majority settings; No incentive
(poisoning)
to remove malicious to guarantee global model quality CIFAR10, and Experimental mechanism that encourages more users
attacks
model updates Clothing1M to participate honestly-majority settings
Detects malicious nodes by testing the local model
Robust to Cloud-side malicious Requires a local test dataset in each client;
quality using a specific testing dataset to test the MNIST, and
[144] label-flipping node detection CV Experimental Limited poisoning attacks tested; Non-IID
sub-models uploaded by edge nodes to detect CIFAR10
attacks mechanism data setting is not considered
malicious nodes
Identifies malicious clients by embedding a
Robust to Malicious client The detection mechanism is not
watermark to the global model and tracking the MNIST, and
[308] Byzantine detection CV Experimental optimized; Limited (two) attacks tested,
degree of watermark recession after local model CIFAR10
attacks mechanism Non-IID dataset is not considered
training

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:52 Md P. Uddin et al.

Table 13. Client Selection-based Treatments to Attain Robustness in FL (Part 3)


Considered Robustness Application Evaluation
Ref. Robustness Approach Dataset Potential Gap
Robustness Technique Domain Type
Robust to model
Both Potential overreliance on history-based
poisoning Mitigates on-off, bad-mouthing, and good-mouthing CV and MNIST,
History-based Experimental mechanisms that might be vulnerable to
[104] attacks and attacks through adaptive trust scaling and anomaly Intrusion CIFAR-10,
anomaly detection and advanced evasion techniques; scalability
malicious and detection Detection KDD Cup ’99
Theoretical concerns in extremely large networks
unreliable clients
Robust to
MNIST, The effectiveness of DGF could be
Byzantine
Presents a dynamic gradient filter algorithm to filter Fashion- reduced in high-dimensional spaces;
[34] failures and Client selection CV Experimental
out malicious updates MNIST, potential reduction in efficiency in
model poisoning
CIFAR-10 targeted attacks
attacks
Adjusting local updates and benchmark updates, The need for further adjustments for NLP
Robust to Fashion-
Robust aggregation using preference values for aggregation, enhancing CV and Text tasks to enhance robustness; performance
[42] label-flipping MNIST, Experimental
and client selection client selection based on loss reduction and Classification could vary based on dataset
attacks Agnews
historical performance characteristics and parameter settings
Robust to model
Potential computational overhead,
poisoning Cosine Combines personalized logit adjustment for non-IID
Intrusion N-BaIoT (IoT effectiveness may diminish with high
[225] attacks and similarity-based data and server-side poisoned client detector using Experimental
detection traffic data) proportions of poisoned clients (>30%),
label-flipping detection cosine similarity
challenges in extreme non-IID settings
attacks
Both
Robust to High computational complexity due to
MNIST, Experimental
[134] Byzantine Client selection Cosine similarity-based Byzantine detection CV encryption; Challenges in scalability with
CIFAR-10 and
attacks large datasets and client numbers
Theoretical
Hierarchical MNIST,
Robust to model Needs exploration in real-time settings
aggregation with Selecting clients for aggregation based on CV and Medical Fashion-
[179] poisoning Experimental with dynamic data and varying client
reputation-based reputation scores Imaging MNIST, Chest
attacks availability
client selection Xray
Robust to model Relies on convergence to detect malicious
Introduces a new metric called GradScore to detect MNIST,
[266] poisoning Client selection CV Experimental activity; the impact on other types of
malicious clients by measuring the gradient norms CIFAR-10
attacks attacks like label-flipping not discussed
Did not address potential alternating
Robust to Novel dynamic reputation protocol to detect and MNIST,
[95] Client selection CV Experimental behavior of participants; Vulnerability to
malicious clients eliminate adaptive participants MedMNIST
51% attack
Robust to Limited to specific types of poisoning and
Byzantine Use of element-wise sign function on local model MNIST, privacy attacks, may not generalize well
[68] attacks and Client selection updates, Poisoning Attack Detector to filter out CV F-MNIST, Experimental to other adversarial settings, potential
poisoning malicious updates CIFAR-10 tradeoff between model accuracy and
attacks robustness due to masking
Robust to
Anomaly detection
malicious and MNIST, High computational complexity for
using autoencoders,
unreliable clients, Fourier transform-based anomaly detection, Credit Fashion anomaly detection, potential issues with
[63] Incentive CV Experimental
and model score-based client incentive mechanism MNIST, scalability in large networks, need for
mechanism based on
poisoning CIFAR-10 real-time detection mechanisms.
credit scores
attacks
Robust to model
Potential computational overhead,
poisoning Groups clients using hypermesh, employs secret MNIST,
scalability challenges in large networks,
[154] attacks, and Client selection sharing and Pederson commitment for detecting CV CIFAR-10, Experimental
complexity in implementing secret
malicious and and removing malicious clients CelebA
sharing and commitment mechanisms
unreliable clients
MNIST,
SCFL framework using SVD for feature extraction Potential issues with scalability and
Robust to CV and Text CIFAR-10,
[242] Client selection and k-means clustering to filter out malicious Experimental complexity; Sensitivity to clustering
backdoor attacks Classification Amazon Fine
updates parameters and Non-IID settings
Food Reviews
CIFAR-10,
MNIST,
Fashion-
Robust to
MNIST,
poisoning Two approaches: AGRMP (Max Pooling) and The need to balance robustness, fidelity,
Amplification of CV and Text CATvsDOG,
[57] attacks and AGRXAI (Explainable AI) to amplify and detect Experimental and efficiency; further work required to
gradients Classification PUR-
Byzantine malicious updates reduce the time complexity for AGRXAI
CHASE100,
attacks
LOCA-
TION30,
TEXAS100
Robust to data
poisoning
attacks, model Use of Wasserstein distance to identify outlying Limited by effectiveness in highly
Two non-IID
[146] poisoning Client selection data commitments and asynchronous model Not specified Experimental non-IID data settings, potential
datasets
attacks and convergence checks fabrication of commitments by attackers
Byzantine
attacks
Robust to model
MNIST,
poisoning PCA for dimensionality reduction, Binary Requires setting parameters like min
EMNIST, Experimental
[167] attacks and Client selection Tree-based Clustering with Noise (BTBCN), CV cluster size, potentially limited scalability
CIFAR10, (extensive)
malicious and Self-Ensemble Detection Correction (SEDC) to larger and more complex models
COVIDx
unreliable clients

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:53

Table 14. Client Selection-based Treatments to Attain Robustness in FL (Part 4)


Considered Robustness Application Evaluation
Ref. Robustness Approach Dataset Potential Gap
Robustness Technique Domain Type
Robust to model
MNIST,
poisoning Detection and Detects malicious clients using cosine distances of Challenges in defending against adaptive
Fashion-
[138] attacks (both punishment of local updates and punishes them based on historical CV Experimental attacks; requires fine-tuning for non-IID
MNIST,
targeted and malicious clients behavior data settings
CIFAR-10
untargeted)
FEMNIST, Complexity in computing Shapley values,
Robust to model
CelebA, potential scalability issues, challenges in
[183] poisoning Client selection Uses Shapley values for smart client selection CV Experimental
Sent140, handling extreme cases of data
attacks
CIFAR-10 heterogeneity
Robust to
CIFAR-10, High computational overhead, potential
Byzantine A Byzantine-robust FL scheme leveraging 3PC to Experimental
CV and Medical MNIST, scalability issues with large-scale
[36] attacks and Client selection securely select and aggregate models while and
Imaging PathMNIST, datasets, complexity in implementation
model poisoning preserving privacy Theoretical
Lung Cancer due to cryptographic techniques
attacks
MNIST,
Robust to Increased computational complexity due
Dual filter mechanism to identify and discard Fashion-
Byzantine to the filtering and clipping mechanisms,
malicious gradients, adaptive weight adjustment to MNIST,
[173] attacks and Client selection CV Experimental scalability challenges in large IoT
reduce the impact of suspicious gradients, and CIFAR-10,
model poisoning networks, dependency on the correct
dynamic clipping to scale gradients Tiny-
attacks setting of threshold parameters
ImageNet
Both
MNIST,
Robust to Experimental Assumption of reliable downlink,
[30] Gradient Recycling FL with Gradient Recycling (FL-GR) CV CIFAR-10,
unreliable clients and potential high memory usage
CIFAR-100
Theoretical
MNIST,
Robust to Voting mechanism for filtering abnormal directions; Fashion- Difficulty in converging the global model
Voting and scaling
[132] poisoning Scaling mechanism for adjusting abnormal CV and HAR MNIST, Experimental under high non-IID settings; Potential
of local updates
attacks magnitudes CIFAR-10, vulnerability to adaptive attacks
BCW, HAR
Robust to
Byzantine Uses cosine-similarity-based Byzantine-robust MNIST, High recovery threshold; Computational
[284] Client selection CV Experimental
attacks and aggregation CIFAR-10 overhead for Lagrange coded computing
stragglers
Robust to
Byzantine Evaluating layer-wise cosine similarity between Performance may degrade under high
Enclave-aided MNIST,
[271] attacks and local and guide models, using boxplot for outlier CV Experimental attack scenarios; efficiency improvements
robust aggregation CIFAR-10
poisoning detection needed for larger models when using TEE
attacks
Robust to Needs further exploration in more
MNIST,
Byzantine Two-stage robust Stage 1 involves finding a robust subspace; Stage 2 CV and Text complex non-IID settings and larger
[14] CIFAR-10, Experimental
attacks and label aggregation estimates the simplex and filters outliers Classification datasets; potential scalability issues under
AG-News
skewness extreme label skewness scenarios
Robust to model The effectiveness of RDGF may decrease
The RDGF algorithm screens out malicious updates
poisoning in scenarios with high variability in client
Robust and Dynamic by calculating distance-based scores for client
[33] attacks and CV MNIST Experimental behavior or extreme levels of data
Gradient Filtering gradients, and uses geometric median aggregation
Byzantine heterogeneity. Further optimization for
and noise addition to enhance robustness
attacks varying attack patterns is needed.
Saliency Maps and
CIFAR-10, Potential computational overhead due to
Robust to Structural Similarity Compares Saliency Maps using SSI to identify and
MNIST, the calculation of Saliency Maps and SSI,
[169] Byzantine Index (SSI) for isolate contributions from potentially malicious CV Experimental
Fashion- challenges in scaling to larger datasets or
attacks detecting gradient clients
MNIST models
anomalies
Potential computational overhead due to
Robust to Incorporates FeaS to increase feature margins MNIST,
Feature center feature separation and outlier detection
evasion attacks between classes and Malicious Center Detection Fashion-
[312] separation, Outlier CV Experimental mechanisms, scalability challenges in
and poisoning (MalD) using Gaussian Mixture Models to detect MNIST,
detection large datasets, dependency on correct
attacks and remove malicious clients CIFAR-10
configuration of the detection mechanism
High complexity in clustering and cosine
Robust to data MNIST,
Clustering-based Robust Federated Clustering with cosine similarity similarity calculations, potential
and model CIFAR-10,
[264] aggregation, Cosine to select similar cluster models and defend against CV Experimental scalability challenges, sensitivity to the
poisoning Fashion-
similarity poisoning attacks number of attackers and data
attacks MNIST
heterogeneity
Robust to model
Detection might be less effective under
poisoning DeMAC system: detecting malicious clients by MNIST,
[100] Client selection CV Experimental extremely low poisoned data rates or
attacks and measuring GradScore CIFAR-10
highly non-IID settings
backdoor attacks
Robust to model
poisoning
Potential performance degradation in
attacks, Dynamic selection of cosine similarity and
Adaptive model MNIST, high attack scenarios with non-IID data;
[272] Byzantine Euclidean distance-based filtering mechanisms with CV Experimental
filtering based CIFAR-10 needs further optimization to balance
attacks and weighted aggregation
robustness and efficiency
label-flipping
attacks
Robust to model The need to address communication
MNIST,
poisoning Blockchain-based overhead and scalability in large-scale
PoIS consensus mechanism based on client Fashion-
[88] attacks and reputation-aware CV Experimental implementations; further research needed
contributions, reputation, and past performance MNIST,
Byzantine client selection on integrating data security with model
CIFAR-10
attacks interpretation

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:54 Md P. Uddin et al.

Table 15. Client Selection-based Treatments to Attain Robustness in FL (Part 5)


Considered Robustness Application Evaluation
Ref. Robustness Approach Dataset Potential Gap
Robustness Technique Domain Type
Robust to
MNIST,
Byzantine Synthesized trust Trust synthesizing mechanism combining trust Potential issues with the root dataset’s
Fashion-
[55] attacks and with anomaly scores (TS) and confidence scores (CS) with CV Experimental bias; need for further optimization of the
MNIST,
poisoning detection dynamic trust ratio trust synthesizing mechanism
CIFAR-10
attacks
The approach may lose iterations during
Both cross-checking, leading to slower
Robust to Byzan- Collaborative blind cross-evaluation mechanism
Medical Experimental convergence; potential risk of false
[58] tine/poisoning Client selection and smart activation mechanism for detecting and ECG dataset
Imaging and alarms during convergence; challenges in
attacks isolating malicious nodes
Theoretical distinguishing between legitimate and
Byzantine nodes
Robust to The approach might lead to slower
Both
Byzantine Intermediate parameter compression, Update convergence and potential false alarms
Experimental
[298] attacks and Client selection quality evaluation mechanism using cosine CV MNIST during model aggregation; scalability
and
label-flipping similarity between model updates could be a challenge with an increased
Theoretical
attacks number of nodes
Robust to model MNIST, Computational complexity in gradient
poisoning Uses gradient similarity for malicious detection, CIFAR-10, similarity calculations, potential
CV and Text
[249] attacks and Client selection contract theory for fair incentive distribution Movie Experimental challenges in scalability and real-time
Classification
malicious and among participants Review detection, balancing fairness and security
unreliable clients dataset (MR) in diverse networks
Computational complexity due to
Robust to data RFCL (Robust Federated Clustering Learning): MNIST,
Clustering, cosine clustering, potential challenges in
and model Groups clients’ models into clusters based on cosine CIFAR-10,
[27] similarity, CV Experimental handling large-scale heterogeneous data,
poisoning similarity and uses meta-learning to aggregate Fashion-
meta-learning reliance on accurate clustering and
attacks models from the most similar clusters MNIST
similarity measures
A four-pronged MNIST,
Robust to 1. Suboptimal performance when
defense mechanism Reliable client selection using absolute similarity Fashion-
[232] Byzantine CV Experimental attackers dominate, 2. Lack of theoretical
to mitigate and spectral-based outlier detection MNIST,
attacks analysis to support the findings
Byzantine attacks CIFAR-10
Robust to
Byzantine Needs further refinement to handle
attacks, MNIST, scenarios with more than 50% malicious
Attention-based Combines scaled dot-product attention mechanism
[25] poisoning CV FMNIST, Experimental clients, and potential improvements in
aggregation with vector filter to weigh and filter gradients
attacks and CIFAR-10 efficiency and scalability under heavy
label-flipping attack conditions
attacks
CIFAR-10,
Robust to Further research needed to develop
Purchase100,
Byzantine defense mechanisms specifically
Resilience through Optimizes adversarial impact while evading CV and Text Texas100,
[292] attacks and Experimental targeting PMIA in various FL settings and
AGR detection by AGRs Classification Location30,
poisoning to explore the implications of AgrEvader
1000 Genome
attacks under different threat models
Project
Robust to
Complexity of the validation process;
poisoning Blockchain-based
Use of a decentralized blockchain committee for MNIST, potential vulnerabilities in the exposure
[229] attacks and decentralized CV Experimental
model aggregation CIFAR-10 of next round’s committee members
Byzantine aggregation
leading to targeted attacks on aggregators
attacks
Robust to model The method may be less effective under
MNIST,
poisoning Dynamic extreme non-IID conditions, and there
Sign Consistency Ratio (SCR) and Norm-based Fashion-
[235] attacks and Clustering-based CV Experimental might be an increased computational cost
adaptive clipping to filter out malicious updates MNIST,
Byzantine Aggregation due to clustering and adaptive clipping
CIFAR-10
attacks processes
Robust to VAE models for detecting and excluding malicious Potential limitations in handling highly
MNIST,
Byzantine model updates (spatial perspective), and a variable fractions of malicious clients and
[61] Anomaly detection CV MNIST-D, Experimental
attacks and momentum-based update filter to generate robust challenges in generalizing across
FEMNIST
backdoor attacks global model updates different datasets and tasks
Robust to model
Both
poisoning Potential high communication and
Integrates anomaly detection to filter out malicious MNIST, Experimental
[166] attacks and Anomaly detection CV computation overhead; scalability
updates CIFAR-10 and
malicious and concerns
Theoretical
unreliable clients
Robust to model
poisoning
MNIST,
attacks, Anomaly detection Potential high computation costs due to
Use of a double-aggregation mechanism combined CIFAR-10,
[301] Byzantine and aggregation CV Experimental double aggregation; may struggle with
with weighted averaging to filter out anomalies Fashion-
attacks and filtering highly sophisticated attacks
MNIST
label-flipping
attacks
The defense performance may degrade
SST-2, IMDB,
Robust to Estimates data divergence among clients to detect CV and Text under non-IID settings and when
[296] Client selection Amazon, Experimental
backdoor attacks backdoor clients Classification adaptive attacks are specifically designed
AgNews
against the proposed method
CIFAR10,
Robust to noisy A Bayesian framework is used to estimate client The effectiveness of the approach in other
Adaptive client CV and Speech Google
[319] and unreliable utility by integrating the weight parameters of local Experimental FL settings or with other types of data
selection Classification Speech,
clients models and their performance noise is not explored
HARBox

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:55

Table 16. Client Selection-based Treatments to Attain Robustness in FL (Part 6)


Considered Robustness Application Evaluation
Ref. Robustness Approach Dataset Potential Gap
Robustness Technique Domain Type
Robust to noisy ICH
Not specified explicitly; may require
label learning, (Intracranial
Noisy client Per-class loss calculation for noisy client detection Medical further investigation into different data
[248] class imbalance, Hemorrhage) Experimental
detection using Gaussian Mixture Model Imaging heterogeneity patterns beyond the
and label noise dataset, ISIC
medical domain
heterogeneity 2019 dataset
MNIST,
Robust to Byzan- Potential over-reliance on dimensionality
Isolation Forest for Detects malicious model updates using Isolation CIFAR-10,
[240] tine/poisoning CV Experimental reduction, impact on convergence not
anomaly detection Forest after dimensionality reduction Fashion-
attacks fully explored
MNIST
Robust to data
Hyperdimensional Potential computational overhead in
poisoning Encodes local model updates using HDC, uses
Computing (HDC) MNIST, encoding/decoding, challenges in non-IID
[91] attacks and cosine similarity-based aggregation to detect and CV Experimental
for encoding CIFAR-10 data settings, possible degradation in
poisoning filter out poisoned updates
updates non-IID accuracy
attacks
Robust to model
High computational and cryptographic
poisoning
FMNIST, overhead, complexity in implementing
attacks, Sybil Private evaluation of local models, hierarchical CV and Text
[115] Client selection CIFAR, TREC, Experimental and verifying zk-SNARKs, challenges in
attacks, and clustering for outlier removal Classification
AGNEWS scaling to larger datasets and more
untargeted and
complex models
targeted attacks
Robust to Assumes sufficient correlation difference
backdoor attacks Graph-theoretic Identifies and isolates backdoor attackers by between benign and malicious clients,
Fashion-
[196] and model approach using analyzing the correlation between clients’ weight CV Experimental may face challenges with highly
MNIST
poisoning correlation analysis updates sophisticated adversaries, and potential
attacks computational overhead
High computational complexity due to
Robust to data MNIST, clustering and similarity analysis,
Clustering with Uses clustering to group client models, cosine
and model CIFAR-10, potential scalability challenges with
[4] cosine similarity, similarity to select high-quality clusters, and CV Experimental
poisoning Fashion- larger datasets or more diverse networks,
Meta-learning meta-learning for personalization and aggregation
attacks MNIST dependency on appropriate parameter
tuning
Robust to Dynamic feature Sparse representation-based feature extraction and Organ-
Medical Limited resistance to data poisoning
[251] malicious client extraction and optimized k-means clustering for identifying and MNIST Experimental
Imaging under severe label skew conditions
attacks clustering excluding malicious clients Axial
The detection rates decrease as the
Detection and RoFL scheme using counters, validators, and
Robust to number of malicious clients increases;
[244] blocking of authenticators to identify and exclude malicious CV MNIST Experimental
malicious attacks limited validation on more complex
malicious updates clients
datasets
Robust to model
poisoning Siamese networks A Siamese network-based anomaly detection Potential computational complexity and
FEMNIST
[210] attacks and and anomaly framework using CNNs to identify malicious CV Experimental scalability issues when using CNNs for
dataset
Byzantine detection updates model detection
attacks

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:56 Md P. Uddin et al.

Table 17. Modification of Model Aggregation-based Treatments to Attain Robustness in FL (Part 1)


Considered Robustness Application Evaluation
Ref. Robustness Approach Dataset Potential Gap
Robustness Technique Domain Type
Detecting the probability distribution of the Limited FL hyperparameters combinations
Modified arithmetic
Robust to softmax layer and then penalizing the client tested; Limited Non-IID cases considered;
[228] averaging by client’s - - -
Byzantine clients models’ weights with abnormal distribution for Poor performance on the highly skewed
weight adjustment
aggregation non-IID case
Geometric-median-based aggregation is Limited data and model poisoning attacks
Geometric-median- Theoretical
Robust to performed after re-weighting the models via FEMNIST and tested; Limited FL hyperparameters
[117] based IoT and
malicious clients excluding extreme outliers based on a Bosch IIoT combinations tested; Limited Non-IID cases
aggregation Experimental
user-defined skewness threshold considered
Not considering Non-IID setting; Simple
To make distributed learning more robust
Two techniques: experimental setting without comprehensive
against Byzantine failures, the server has two
median-based GD Theoretical parameter configurations and analysis; Only
Robust to options to do the global gradient update: (i)
[274] and trimmed-mean- CV MNIST and one dataset tested; Limited technical
Byzantine failure utilize the median-based gradient descent
based Experimental innovation as two main GD methods are not
method; or (ii) use the trimmed-mean-based
GD originally proposed by the authors; Limited
gradient descent method
Byzantine failures considered
Trimmed softmax is used as the weighting Wrong exclusion could lead to significant
Robust to Geometric-median- function. In essence, the aggregation yields a damage to the model’s performance; Limited
[162] corrupted based global prediction that minimizes the total of the CV FEMNIST Experimental experiments performed in the classical FL
models aggregation weighted Euclidean distances between the global context; Limited experiments on non-IID
forecast and each specific local prediction. data
Krum approach identifies a single key model
that is considered the most reliable and accurate Sensitive to outliers and models with distinct
among the group. This key model is then used to outputs, impacting key model selection; The
Key-Resolver using Theoretical
Robust to outlier make the final decision. The Krum approach chosen distance metric greatly influences
[15] medians-based CV MNIST and
models aims to mitigate the negative effects of Krum approach performance; Limited
aggregation Experimental
potentially faulty or malicious models by experiments were performed in the classical
selecting a key model that is less likely to be FL context; No experiments on non-IID data
compromised.
The server calculates the median of uploaded
parameters and uses it as the optimal global No study how to maintain privacy and
Robust to Trimmed padding model. For the majority of benign clients, this robustness simultaneously in the case of
[135] Byzantine mean-based median aligns with good client values. The CV MNIST Experimental non-IID scenario; Limited experiments
attackers aggregation server replaces trimmed parameters, maintains performed in the classical FL context;
trimmed mean with a fixed ratio, and aggregates Limited experiments on non-IID data
using averaging for the final global model.
The method is based on a robust aggregation
oracle based on the geometric median, which CV and MNIST, Theoretical There is no investigation of resilience in the
Robust to outlier Use of the geometric
[187] produces a robust aggregate by iterating a language CIFAR10, and and presence of heterogeneity and
updates median
normal non-robust averaging oracle a certain modeling Shakespeare Experimental personalization
number of times
Pruning a small percentage of extreme (lowest CV and MNIST, Theoretical
Robust to outlier Use of the node-wise Direct pruning of weights may impede the
[231] and highest) weights at each node of the local language CIFAR10, and and
updates trimmed mean intended robust aggregation
models to perform the node-wise trimmed mean modeling Shakespeare Experimental
Zeno ranks clients based on loss function Classical adversarial attacks (evasion attacks
Robust to Zeno and descent predictions for pRef. Higher-ranked RPi model or model poisoning attacks) not tested;
Device
[206] label-flipping coordinate-wise clients are aggregated. Coordinate-wise median device Experimental Limited experiments performed in the
identification
attack median aggregation resembles average aggregation but combines identification classical FL context; No experiments on
weights by local model weight medians. non-IID data
The Hessian as a scaling matrix is used in a way
Robust to the
Hessian analogous to Quasi-Newton approaches by MNIST, Increases communication cost as the Hessian
variations of
matrix-based utilizing the first-order information of the loss Fashion- matrix of the models need to be
[2] estimate in CV Experimental
aggregation function. The influence of different levels of data MNIST, and communicated among the server and clients;
models
mechanism quality across local models is captured by this CIFAR10 Limited experiments on non-IID data
parameters
treatment.
A truth inference framework estimates client
Aggregation MNIST, Limited poisoning attacks tested; Limited
Robust to update reliability to remove outliers before
algorithm inspired Fashion- experiments performed in the classical FL
[223] adversarial aggregation, countering data poisoning. CV Experimental
by the truth MNIST, and context; Limited experiments on non-IID
attacks Multi-round historical parameters enhance
inference methods CIFAR10 data
robust aggregation.
Robust to Sanitizing factors based on temporal divergence Collusive poisoning attacks become
targeted and Weights divergence and historical behaviors compute participant unbeatable when the proportion of
MNIST, and
[160] untargeted calculation-based divergence, used to sanitize local model CV Experimental adversarial participants is relatively large,
CIFAR10
model poisoning aggregation parameters during aggregation in the proposed like about 50%; No experiments on non-IID
attacks method data
Median-based Introduces a client sampling scheme for
Robust to aggregation estimating adversary probability based on Experiments conducted using Intel SGX’s
Theoretical
Byzantine followed by Euclidean distances among sampled vectors. It MNIST, and architecture in a LAN environment has
[303] CV and
(poisoning) Euclidean proposes using coordinate-wise median from CIFAR10 limited memory capacity; Very limited
Experimental
attacks distance-based client honest clients to eliminate malicious updates experiments on non-IID data
sampling effectively.
Robust to
Analyzes several Byzantine-Tolerant Not considering non-IID settings; Limited
backdoor model Aggregation rule CIFAR-10,
[203] aggregation methods including Krum, CV Experimental considerations on aggregation strategies; No
poisoning modification and EMNIST
Multi-Krum, RFA, and Norm-Difference Clipping new development for robust FL
attacks

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:57

Table 18. Modification of Model Aggregation-based Treatments to Attain Robustness in FL (Part 2)


Considered Robustness Application Evaluation
Ref. Robustness Approach Dataset Potential Gap
Robustness Technique Domain Type
Robust to Challenges in distinguishing between
Byzantine CIFAR-10, malicious and normal gradients in
Cosine Similarity Convolutional Kernel Angle-based Defense
[320] attacks and CV Fashion- Experimental high-dimensional space; may require
Aggregation Aggregation
model poisoning MNIST substantial computational resources for
attacks large models
CIFAR10,
Robust to model Randomization and
Switch, Layered-Switch, and Weighted FedAvg CIFAR100, Limited testing on other datasets,
[168] poisoning aggregation CV Experimental
aggregation functions Fashion- scalability of the approach
attacks techniques
MNIST
Robust to
Byzantine-
Integrates Robust Normal Aggregation (RN) module, Further refinement of defense capabilities
resilient, Data Integration of robust MNIST,
[262] Personalized Fusion (PF) module, normality tests, CV Experimental against more complex Byzantine and
poisoning statistical methods CIFAR-10
probabilistic perturbations poisoning attacks is needed
attacks,
Backdoor attacks
Robust to model Limited to specific types of attacks
MNIST,
poisoning Both (model and data poisoning); may require
Quadratic Voting Aggregates updates based on QV, penalizes Fashion-
attacks, Experimental integration with other Byzantine-robust
[32] (QV) mechanism for abnormal similarity scores, and allocates unequal CV MNIST,
Byzantine and techniques for broader robustness;
aggregation budgets to parties based on reputation FEMNIST,
attacks, privacy Theoretical computational overhead due to QV and
CIFAR10.
attacks reputation-based budgeting
Synthetic
M-estimation with data (e.g.,
Computational complexity in
Robust to robust aggregation linear Experimental
Combines gradient tracking and proximal algorithm Text high-dimensional settings, sensitivity to
[291] Byzantine rules (e.g., Median, regression), and
with robust aggregation Classification choice of penalty parameters, potential
attacks Trimmed Mean, Communities Theoretical
scalability issues in large-scale networks
Krum) and Crime
dataset
MNIST,
Robust to Byzan- Reputation
Fashion- Scalability to larger datasets and more
[106] tine/poisoning mechanism using Phishing method and reputation-based aggregation CV Experimental
MNIST, complex models
attacks Bayesian inference
CIFAR-10
Robust to
poisoning
attacks,
Robust mean-based Use of a decentralized anomaly detection MNIST, High computational overhead, especially
[105] Byzantine CV Experimental
aggregation mechanism and robust mean aggregation CIFAR-10 in large-scale FL environments
attacks and
label-flipping
attacks
Potential constraints due to limited
Enhanced CIFAR-10,
memory and processing capabilities of
Robust to median-based and Offloads Byzantine defense to programmable ResNet,
switches, complexity in implementation
[199] Byzantine trimmed-mean- switches, implements enhanced median-based and CV GoogleNet, Experimental
and hardware dependency, scalability
failures based defense trimmed-mean-based defense methods Transformer,
issues with increasing model size and
methods VGG16, BERT
network traffic
Assumes the existence of a small
Robust to model
auxiliary dataset for the server, potential
poisoning Leverages random noise for Byzantine aggregation, Experimental
Two-stage MNIST, challenges in scaling to larger networks,
[254] attacks and two-stage aggregation with filtering and gradient CV and
aggregation CIFAR-10 computational overhead due to the
Byzantine similarity Theoretical
combination of DP and Byzantine
attacks
resilience
Robust to model Potential computational overheads
Both
poisoning despite optimizations, especially in
Robust mean-based Secure cosine similarity and incentive-based MNIST, Experimental
[280] attacks and CV larger-scale implementations; needs
aggregation Byzantine-tolerant aggregation CIFAR-10 and
Byzantine further efficiency improvements for
Theoretical
attacks widespread adoption
Robust to FedTop algorithm: parameter normalization, MNIST, Experimental Limitations in dataset choosing, base
Performance-based
[233] poisoning performance ranking, and performance-based CV CIFAR-10, and model choosing, and number of malicious
weighted average
attacks weighted average YELP Theoretical models
MNIST, Both
Robust to Sensitivity to choice of kernel width;
Robust aggregation Maximum Correntropy Aggregation (MCA) method Fashion- Experimental
[150] Byzantine CV Limited robustness to highly
using correntropy using fixed-point iteration MNIST, and
attacks sophisticated Byzantine attacks
CIFAR-10 Theoretical
MNIST,
Fashion- The framework may not fully address
Robust to local Trusted FL Trusted Coordinates (FLTC), which selects
MNIST, other advanced attacks, and defense
[90] model poisoning coordinate-based trusted coordinates and aggregates based on CV Experimental
CIFAR-10, performance may degrade with larger
attacks aggregation changes in direction and magnitude
CINIC-10, datasets or more complex models
CIFAR-100
Robust to model
FL under non-IID
poisoning Scalability concerns and handling of
settings with robust Intrusion
[50] attacks and Use of FedAvg, FedProx, and Scaffold algorithms CICIoT2023 Experimental statistical heterogeneity in real-world IoT
aggregation Detection
adversarial scenarios
techniques
attacks
Robust to
Needs further validation on more diverse
poisoning Confidence Retinal OCT
Scaling of gradient updates based on confidence Medical datasets and real-world scenarios,
[41] attacks and score-based Images Experimental
scores derived from SoftMax probabilities Imaging potential impact of miscalculated
Byzantine aggregation (Kaggle)
confidence scores on model performance
attacks

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:58 Md P. Uddin et al.

Table 19. Modification of Model Aggregation-based Treatments to Attain Robustness in FL (Part 3)


Considered Robustness Application Evaluation
Ref. Robustness Approach Dataset Potential Gap
Robustness Technique Domain Type
Robust to
Both Limited to strongly convex loss functions;
Byzantine Uses coordinate-wise median and coordinate-wise
Robust aggregation Simulated Experimental performance might be affected by
[142] attacks and trimmed mean aggregation to handle Byzantine Simulated data
methods data and high-dimensional data and varying
adversarial machines and optimize models for each cluster
Theoretical cluster sizes
attacks
Higher computational complexity
Introduced multiple robust estimators, including Both compared to simpler methods such as
Robust to MNIST,
Byzantine-robust NO-REGRET, FILTERING, and GAN-based Experimental median or trimmed mean; further
[314] Byzantine CV Fashion-
protocols algorithms; use of bucketing and various robust and research needed for designing
attacks MNIST
estimation subroutines Theoretical statistically optimal, computationally
efficient protocols
Robust to
Generative Limitation in handling scenarios with a
poisoning
model-based Selective parameter aggregation driven by synthetic large number of malicious clients and
[24] attacks and CV MNIST Experimental
selective validation data from CVAEs potential overhead in communication and
label-flipping
aggregation computation
attacks
Robust to
Attention MNIST, Requires clean datasets for evaluation;
Byzantine FedNAT: Using attention transfer for global
refinement with Fashion- potential privacy concerns when
[236] attacks, data aggregation and FedNAT loss function for local CV and HAR Experimental
Wasserstein distance MNIST, collecting data; may struggle with
poisoning, and updates
scoring HAR non-IID data in extreme conditions
backdoor attacks
Robust to
poisoning
MNIST,
attacks Uses historical information to estimate client model
Historical Fashion- Experimental
(untargeted and updates and apply optimization strategies such as Potential challenge in large-scale or
[19] information-based CV and HCR MNIST, and
targeted, warm-up, periodic correction, abnormality fixing, highly dynamic FL systems
recovery Purchase, Theoretical
including and final tuning
HAR
backdoor
attacks)
Robust to model
poisoning and Reputation-based Two-step adversary detection: DBSCAN-based FEMNIST, Potential limitations in handling
[273] CV Experimental
data poisoning robust aggregation noise filtering and accuracy evaluation Street-12 extremely non-IID data distributions
attacks
Robust to model Computational complexity due to
FEMNIST,
poisoning Bootstrap Enhances federated aggregation using a bootstrap bootstrapping and median-of-means
FedCelebA,
[256] attacks and Median-of-Means sampling method combined with median-of-means CV Experimental calculations, scalability concerns in
Synthetic
client-wise (bMOM) estimation to handle outliers and adversarial clients large-scale settings, sensitivity to block
dataset
outliers size and number of clusters
Robust to
High computational complexity in
Byzantine
Improved Weiszfeld Enhanced Weiszfeld algorithm for robust large-scale deployments, challenges in
attacks, and data MNIST,
[218] aggregation aggregation, CNNs for semantic feature extraction CV Experimental handling a high number of Byzantine
and model EMNIST
algorithm in image transmission users, sensitivity to the configuration of
poisoning
the algorithm’s tunable parameters
attacks

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:59

Table 20. Hyperparameter Tuning during Model Aggregation-based Treatments to Attain Robustness in FL
(Part 1)
Considered Robustness Application Evaluation
Ref. Robustness Approach Dataset Potential Gap
Robustness Technique Domain Type
No non-IID data consideration; Limited
CIFAR10,
Robust to faulty Filtering-out the bad Detecting and discarding bad updates provided by FL hyperparameter combinations tested;
[221] CV MNIST, and Experimental
and noisy clients weight updates the clients via the game-theoretic approach Increase the computation cost of the
SPAMBASE
server as a large number of clients
They propose an objective function that minimizes
CIFAR10, Smaller objective hinders low-weight
the weighted sum of client empirical risk, Theoretical
Robust to Auto-weighted FEMNIST, clients, raising fairness issues in non-IID
[118] incorporating regularization. Weights are assigned CV and
malicious clients robust FL and data. No malicious client handling
based on individual risk compared to the average Experimental
Shakespeare assumes all honesty, not realistic in FL.
risk of top-performing clients.
Robust to Modifying weights Clients prioritize neighbors with higher rewards, Limited experiments performed in the
[108] malicious during model reducing cooperation with those having lower Networking Experimental classical FL context; No experiments on
updates aggregation rewards to maximize overall networked rewards non-IID case
It involves four steps: (1) calculate pairwise cosine
similarity of client updates; (2) cluster using MNIST,
Robust to Robust aggregation Poor performance on non-IID settings;
agglomerative clustering and threshold partition to Fashion-
Byzantine via spatial-temporal CV, and Text Simple gradient similarity measure
[131] remove anomalous gradients; (3) aggregate MNIST, Experimental
adversarial pattern analysis Classification (cosine distance), which tends to be
gradients using Median; (4) for cross-device FL, CIFAR10, and
attacks (STPA) unstable
update global gradient using historical momentum Spambase
update direction
It involves five steps: (1) assess data verification
score on shared client data; (2) compute anomaly
score using adaptive anomaly detection model; (3) Rely on the shared client data, which
Robust to Credibility
combine both scores for client model credibility; (4) MNIST, and violates the data privacy; Limited
[281] Byzantine assessment based CV Experimental
aggregate initial global model with weighted CIFAR10 experiments performed in the classical FL
attacks aggregation
coefficients based on credibility scores; (5) train context
initial global model on shared clients’ data for
non-IID adaptation
No consideration of non-IID settings;
Robust to
It computes confidence scores based on the test Limited experimental datasets and
gradient leakage Adversarial-aware
accuracy on a verification dataset; then, in light of settings; Limited baselines (just two
[53] and model gradient aggregation CV CIFAR100 Experimental
confidence scores, it judges which clients are comparison baselines); Rely on extra
poisoning method
malicious private shared test dataset on the server;
attacks
Limited attacks considered
It transforms the problem of computing
Rely on public verification dataset on the
contribution via the Shapley value of each sample to
server; No consideration of non-IID data;
a Federated Bilevel Optimization problem to realize MNIST, Theoretical
Robust to noisy Contribution-based Additional complex computation
[110] the computation of training sample contributions; CV CIFAR10, and and
labels sample re-weighting workload on the server; Extra
then, based on the contribution of each sample, it FEMNIST Experimental
communication cost besides the normal
re-weights each samples’ contribution on the model
model gradients
gradient update
It (1) uses cosine-based similarity method to
measure alignment level between one client and all
Robust to
A reputation-based other clients; (2) based on the cosine similarity, MNIST, Only one baseline method considered;
targeted
[11] defense against adjusts adaptive learning rate of client model and CV CIFAR10, and Experimental Unstable cosine similarity under different
poisoning
poisoning attacks computes the reputation scores; (3) according to the Loan non-IID settings
attacks
reputation scores of each local model, it selects the
final local models to take the global aggregation
It: (1) constructs decision trees for local model Not considering non-IID data; Limited
parameters, and computes the probability of model poisoning attacks tested; Limited
anomalies based on search path length on iTrees; (2) FL hyperparameters settings; Rely on
Robust to model Isolated forest-based
employs pre-trained detection module on auxiliary CV and Text MNIST, and auxiliary data; increasing complexity to
[77] poisoning anomaly detection Experimental
data to compute the anomaly scores from the Classification Shakespeare train the auxiliary detection model; High
attacks method
isolated forest module; (3) trims malicious nodes in computation cost to implement the
iTrees and aggregates the remaining client building of the isolated forest on large
gradients to the new global model and complex local model architectures
It aims to optimize model performance while
preserving privacy. It permits individual model Hard thresholding is considered for
Wasserstein
parameters and privacy budget choices for each Earning allocating privacy budget; Increases
Robust to distance-based Adult and Theoretical
training client. By utilizing a Wasserstein Prediction and computation and communication costs;
[212] adversarial uncertainty set Loan from and
distance-based ball to create an uncertainty set and Loan Limited experiments performed in the
attacks generation for Kaggle Experimental
linking the privacy budget to the ball’s radius, the Classification classical FL context; Limited experiments
aggregation
technique transforms the FL problem into an on non-IID data
equivalent form.
The central idea of this novel anomaly detector is to The supervised anomaly detector in the
Supervised anomaly differentiate benign updates from malicious ones. MNIST, server requires separate dataset and
Robust to
[193] detector-based By implementing the anomaly detector in FL, it CV SVHN, and Experimental training; Limited experiments performed
malicious clients
aggregation becomes possible to effectively filter out harmful CIFAR10 in the classical FL context; No
updates. experiments on non-IID data
MNIST, No significant contribution to the novelty
Robust to Aggregation with an
Proposes to use the existing robust aggregation rule, Fashion- of the aggregation rule; Limited
[309] Byzantine edge-based 3-layer CV Experimental
e.g., trimmed mean MNIST, and experiments performed in the classical FL
adversaries FL architecture
CIFAR10 context; No experiments on non-IID data
k-reliable neighbors can provide clean and useful Requires access to other clients’ local
Exploiting k-reliable information for training even when the target CIFAR10, models inside another client; No
Robust to noisy neighbors with high client’s training data is unreliable due to significant CIFAR100, treatment on the models’ aggregation;
[96] CV Experimental
labels data expertise or label noise, satisfying the condition of data and mini- Limited experiments performed in the
similarity expertise of a client, and data similarity between WebVision classical FL context; No experiments on
two clients is satisfied non-IID data

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:60 Md P. Uddin et al.

Table 21. Hyperparameter Tuning during Model Aggregation-based Treatments to Attain Robustness in
FL (Part 2)
Considered Robustness Application Evaluation
Ref. Robustness Approach Dataset Potential Gap
Robustness Technique Domain Type
Utilizing the discrepancies in pairwise
penultimate layer representations (PLRs) among
model updates, a trust score estimation method
Penultimate layer
Robust to model is introduced. Each update’s trust score reflects FMNIST, Increases computation cost; Limited
representations-
[237] poisoning its alignment with the majority of trustworthy CV CIFAR10, and Experimental experiments performed in the classical FL
based
attacks clients while assigning lower scores to those KATHER context; Limited experiments on non-IID data
aggregation
distant from the benign cluster. Aggregating
model changes based on trust scores update the
global model.
MNIST, Requires multiple global models training; No
Clusters clients into different groups, trains a
Robust to CV and Human CIFAR10, and leveraging of any prior knowledge on the
Group-wise robust separate global model in each cluster, take a
[20] poisoning Activity Human Experimental learning task or the base FL algorithm when
aggregation majority vote among the global models to
attacks Recognition Activity deriving our certified security levels; Limited
classify a test input
Recognition experiments on non-IID data
Hamming Theoretical Requires a clean dataset in the server; Limited
Robust to It uses a small root-dataset and server-model for MNIST, and
[38] distance-based CV and experiments performed in the classical FL
Byzantine attack bootstrapping trust CIFAR10
aggregation Experimental context; Limited experiments on non-IID data
To determine the reconstruction errors of
Robust to Reconstruction error
updates, it employs a variational autoencoder MNIST, Increases computation and communication costs;
Byzantine attack probability
[60] model. Updates are declared malicious and CV FEMNIST, Experimental Limited experiments performed in the classical
and backdoor distribution-based
discarded if their reconstruction mistakes are and Vehicle FL context; Limited experiments on non-IID data
attack aggregation
greater than the norm.
MNIST,
A modified To counter local model poisoning, RONI and
Fashion- Requires a clean dataset in the server; No study
approach based on TRIM have been adapted. Before computing the
MNIST, on targeted poisoning attacks; Limited
Robust to local Reject on Negative global model using Byzantine-robust
CH-MNIST, performance to detect compromised local
[45] model poisoning Impact (RONI) and aggregation, both defenses remove potentially CV Experimental
and Breast models; Limited experiments performed in the
attacks TRIM jointly harmful local models. The first defense targets
Cancer classical FL context; Limited experiments on
identify a training models raising global error rates, and the second
Wisconsin non-IID data
dataset subset focuses on loss increases, validated separately.
(Diagnostic)
Through evaluating the alignment between the
server’s parameters and received ones, malicious
Learning data and MNIST, No withstand on more powerful targeted and
Robust to clients can be detected. Strong alignment
model FEMNIST, untargeted adversarial attacks; Limited
[188] label-flipping signifies reliable data, while weak alignment CV Experimental
association-based CIFAR10, and experiments performed in the classical FL
attack implies otherwise. This filtering process
aggregation APTOS context; No experiments on non-IID data
excludes malicious parameters and permits
comparable modifications.
The approach assumes that (i) the Euclidean
distance varies between benign and malicious No consideration of the detection of poisoning
Robust to Historical distance
clients, and (ii) the distances mostly resemble attacks under real non-IID data scenarios; No
[214] Byzantine detection-based CV MNIST Experimental
one another due to fewer malicious clients. Thus, strategy for how the server excludes modified
attackers aggregation
the method picks a t-client combination to distances by malicious clients
minimize their cumulative distances.
The proposed approach unifies smooth
Unification of
Robust to (geometric median and coordinate-wise median) MNIST, Theoretical An in-depth exploration of neural network
smooth and
[278] outliers or and non-smooth aggregation through CV FEMNIST, and layer-specific aggregation is not explored;
non-smooth
stragglers smoothing with a parameter by employing the and Sent140 Experimental Limited experiments on non-IID data
aggregation
Moreau envelope smoothed approximation
Enhancing privacy protection, a Hierarchical
Aggregation FL approach replaces the standard
Requires a sub-aggregator; Aggregation protocol
aggregation structure. Users split their Fashion-
Robust to Hierarchical is not verifiable; Limited experiments performed
[26] parameters and send them to sub-aggregators. CV MNIST, and Experimental
tempered models aggregation in the classical FL context; No experiments on
Sub-aggregators then forward user parameter CIFAR10
non-IID data
parts to the main aggregator for final
consolidation.
The procedure is applied to a subset of random MNIST, Requires more computation as multiple global
CV, and Human Theoretical
Robust to Ensemble federated clients to learn several global models. The global Human models are trained; Limited experiments
[18] Activity and
malicious clients aggregation models are used to reach a consensus when Activity performed in the classical FL context; No
Recognition Experimental
predicting a testing example’s label. Recognition experiments on non-IID data
Additional access to the private data of the
Metrics learning and During local training, the local data is encoded
Robust to Fashion- clients is required to preclude mislabeled data
unsupervised using metrics learning, and malicious data is
[112] malicious CV MNIST, and Experimental inside a client; Limited experiments performed
clustering-based precluded using the unsupervised clustering
weights CIFAR10 in the classical FL context; No experiments on
aggregation technique K-means
non-IID data
To reduce the effects of quantization errors, the
Robust to
Joint bandwidth and channel bandwidth and quantization bits are No consideration of non-IID data issue in the
transmission Theoretical
quantization bit carefully divided among the clients, subject to theoretical proof; Very limited experiments
[241] outage and CV MNIST and
allocation across wireless resource limits, constraints on performed in the classical FL context; No
quantization Experimental
clients transmission delays, and constraints on the experiments on non-IID data
errors
probabilities of transmission outages
Tracking-based The proposed analog transmission may not be
Robust to
stochastic By successfully averaging out the noise, suitable in some application scenarios;
transmission Theoretical
approximation and tracking-based stochastic approximation (SA) MNIST, and Non-convexity is not considered during
[207] channel effects, CV and
analog transmission iteration counteracts the noise’s impact on the CIFAR10 theoretical proof; Limited experiments
such as noise and Experimental
policy of local final convergence performed in the classical FL context; Limited
fading
models experiments on non-IID data

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
A Systematic Literature Review of Robust Federated Learning 245:61

Table 22. Hyperparameter Tuning during Model Aggregation-based Treatments to Attain Robustness in
FL (Part 3)
Considered Robustness Application Evaluation
Ref. Robustness Approach Dataset Potential Gap
Robustness Technique Domain Type
Each group of clients is given a transmission
No analysis on the effect of the imperfect
time slot after being split into them at random.
Robust to Theoretical channel knowledge on the proposed
Robust transmission The parameter server uses the robust
[216] Byzantine CV MNIST and algorithm; Very limited experiments
and aggregation aggregation approach to combine the results
attacks Experimental performed in classical FL context; No
from the various groups and then sends the
experiments on non-IID data
results to the clients for another training session.
To ensure uniform decision boundaries, the
server collaborates with local models, Data privacy is compromised as the central
Maintains consistent exchanging class-specific centroids. These features (class-wise centroids) of local data
Robust to noisy decision boundaries centroids, representing local data features, are CIFAR10, and on each device need to be shared with the
[268] CV Experimental
labels by interchanging synchronized by the server in each Clothing1M server; Limited experiments performed in
class-wise centroids communication round. This synchronization classical FL context; No experiments on
aids in forming consistent decision boundaries non-IID data
despite the noise distribution variations.
The alarm mechanism assesses the global
Proactive weights. In each communication round, clients Fashion-
Robust to Requires a local test dataset in each client;
alarming-based employ their local weights and test datasets to MNIST,
[64] Byzantine CV Experimental Limited experiments performed in classical
Byzantine-robust verify the global weights. If a client detects CIFAR10,
attacks FL context
aggregation rules poisoned weights, then the FL server initiates a Clothing1M
detection process to exclude them.
Calculates the divergence between distinct
clients’ local parameter updates and computes
Robust to Limited poisoning attacks tested; Limited
Distance-based an outlier score for each using Copula-based CheXpert,
Byzantine Medical experiments performed in classical FL
[5] outlier suppression outlier detection. These scores are transformed and Experimental
(poisoning) Imaging context; No fairness among the clients is
aggregation rule into normalized weights via a softmax function, HAM10000
attacks considered
influencing the global model update through
weighted averaging.
Robust to
Finding the number of clusters is costly;
Byzantine Clusters model parameters into different groups
Group-wise robust MNIST, Limited poisoning attacks tested; Limited
[277] failures and and applies robust aggregation oracle within CV Experimental
aggregation CIFAR10 experiments performed in classical FL
malicious each cluster
context
participants
Real-world
Robust to industrial
poisoning CPS dataset
Industrial
attacks, Trust-based model Trustworthy model update using local and (Gas Pipeline Heavy reliance on clean auxiliary datasets,
[318] Cyber-Physical Experimental
Byzantine update strategy global model updates evaluation System, scalability challenges under complex attacks
Systems (CPS)
attacks, Water
free-riders Storage
System)
Robust to
Potential reduction in model accuracy when
Byzantine Use of backdoor triggers to identify and filter
Weighted MNIST, backdoor triggers are embedded, requires
[270] attacks and data malicious local models, adjustment factors based CV Experimental
aggregation CIFAR-10 fine-tuning of trigger proportion and
poisoning on final layer parameters, weighted aggregation
detection thresholds
attacks
Current approaches are focused on
Robust to Dynamic assignment of honest and contribution Byzantine robustness but may still have
MNIST,
Byzantine Contribution-wise scores to each client’s gradient, combining them limitations under specific attack scenarios,
Fashion
[125] attacks and Byzantine-robust to generate a final score for weighted CV Experimental such as when class imbalance is exploited.
MNIST,
model poisoning aggregation rule aggregation. Trimmed-normalized aggregation The method may require further exploration
CIFAR-10
attacks method to defend against scaling attacks. to enhance performance in non-IID settings
with more complex datasets.
Geometric and
Robust to Reliance on the quality of union client data
qualitative Uses cosine similarity, Euclidean distance, and MNIST,
[139] poisoning CV Experimental may affect robustness under certain
metric-based distributed scoring by union clients CIFAR-10
attacks scenarios
defense
Agglomerative
Robust to Hierarchical MNIST, Assumes benign-majority, potential high
Uses AHC to find benign updates, 2PC for
[122] Byzantine Clustering (AHC), CV Synthetic Experimental computational complexity, requires two
privacy-preserving aggregation
attacks 2PC (Two-party data non-collusive aggregators
Computation)
Robust to
Byzantine The method may have limited effectiveness
Uses a gating unit within the MoE to assign
attacks, and data Server-side Mixture MNIST, when the server-side data is extremely
[43] weights to client updates based on their CV Experimental
and model of Experts model CIFAR-10 scarce, and it might struggle with highly
reliability
poisoning complex model structures
attacks
Bayesian framework CIFAR-10,
Robust to noisy FedTrans method using variational inference Dependency on auxiliary data may not cover
[267] for client utility CV Fashion- Experimental
data from clients algorithm and auxiliary data all possible data noise patterns in clients
estimation MNIST
Robust to
Clustering-based
Byzantine A practical variant of Krum that estimates the Experimental Assumes benign majority; relies on accurate
Krum, Sensitivity
[99] attacks and number of attackers using clustering techniques CV MNIST and clustering; computational complexity with
analysis of attacker
model poisoning and adapts Krum’s aggregation accordingly Theoretical larger networks; sensitive to clustering errors
count
attacks
Hierarchical fusion Potential scalability issues, computational
Robust to of heterogeneous HeteroArc fusion layer for aggregating MNIST, overhead due to hierarchical processing,
[83] CV Experimental
malicious clients models, noise heterogeneous models CIFAR-10 challenges in tuning for different levels of
mitigation heterogeneity and noise

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.
245:62 Md P. Uddin et al.

Table 23. Hyperparameter Tuning during Model Aggregation-based Treatments to Attain Robustness in
FL (Part 4)
Considered Robustness Application Evaluation
Ref. Robustness Approach Dataset Potential Gap
Robustness Technique Domain Type
Robust to
Both Potential computational overhead due to the
Byzantine
DPFA, DPLoss aggregation rule, DPVeri MNIST, Experimental implementation of DP and verifiability;
[51] attacks and Differential privacy CV
verification scheme CIFAR-10 and Scalability concerns with increasing client
poisoning
Theoretical numbers
attacks
Robust to
Byzantine
Clustering-based Mini-FL framework using grouping of Non-IID MNIST,
attacks and Potential limitations in extreme Non-IID
[129] Byzantine-robust features (Geo-feature, Time-feature, CV Fashion- Experimental
untargeted scenarios and scalability to larger datasets
aggregation User-feature) for aggregation MNIST
model poisoning
attacks
Modified
Two Byzantine-resilient aggregation techniques: Computational challenges in high-dimensional
Robust to coordinate-wise Experimental
Modified coordinate-wise screening, Synthetic settings, difficulty in implementing centerpoint
[175] Byzantine screening, Not specified and
Centerpoint aggregation based on discrete dataset aggregation for large networks; Scalability
attacks Centerpoint Theoretical
geometry concerns with the number of agents
aggregation
Homomorphic
Robust to High computational overhead due to encryption,
encryption, Secret Mask-NFRT uses double-masking and secret Medical UCI-PT,
[71] poisoning Experimental complexity in verification process, challenges in
sharing, Bilinear sharing for dropout tolerance Imaging UCI-CT
attacks large-scale IoMT applications
signatures
Homomorphic MNIST,
Robust to Combines distributed Paillier encryption with High computational overhead due to encryption,
encryption, Fashion-
[155] Byzantine zero-knowledge proof to ensure privacy and CV Experimental potential challenges in scalability, and
Zero-knowledge MNIST,
attacks robustness against Byzantine workers complexity in implementation
proof EMNIST
Both
Robust to model Aggregation
Krum aggregation algorithm for detecting Power grid load Not explicitly Experimental Potential scalability issues and handling of
[124] poisoning algorithm
abnormal gradients forecasting mentioned and heterogeneous data
attacks modification
Theoretical
Cryptographic
Robust to Includes a shifted DReLU function, Additive Relies on a trusted third-party arbitrator; high
protocols with MNIST,
[149] Byzantine Secret Sharing, Oblivious Transfer, CV Experimental communication and computation costs in
optimized CIFAR-10
attacks Pseudo-random Generator large-scale FL settings
aggregation rules
Robust to model Client-side
FedValidate framework: Adjusts aggregation MNIST, Assumes gradient visibility; limited to certain
[321] poisoning collaborative CV Experimental
weights based on client credibility CIFAR-10 attack types
attacks validation
MNIST, Dependency on the assumption that gradient
Robust to model Weighting of
CV and Text CIFAR-10, Experimental, flips are rare in benign conditions; may not
[208] poisoning gradients using Uses flip-score to detect malicious clients
Classification FEMNIST, Theoretical cover scenarios where benign clients exhibit
attacks reputation score
Shakespeare similar behavior
Robust to Byzan- Fuzzy
FLEvaluate method with trust evaluation Dependency on a clean server-side dataset for
[62] tine/poisoning comprehensive CV CIFAR-10 Experimental
mechanism accurate trust evaluation may limit scalability
attacks evaluation method
Parameter Classifies DNN parameters into critical and
CIFAR-10, Potential dependency on accurate classification
Robust to noisy classification, noncritical, applies different update rules,
[114] CV Fashion- Experimental of parameters, may not perform well in cases
labels Weighted weighted aggregation based on learning
MNIST with extreme noise
aggregation efficiency
CIFAR-10,
Robust to data
Adaptive weighting Fashion- High computational cost for Shapley value
poisoning ShapleyFL framework using surrogate federated CV and Medical
[219] based on Shapley MNIST, Experimental calculation; Limited evaluation on complex and
attacks and Shapley value for client weighting Imaging
value Fed-ISIC2019 diverse datasets
malicious clients
(healthcare)

Received 15 September 2023; revised 8 December 2024; accepted 22 March 2025

ACM Comput. Surv., Vol. 57, No. 10, Article 245. Publication date: May 2025.

Common questions

Powered by AI

Anomaly detection plays a critical role in defending Federated Learning models by identifying and excluding potentially malicious updates. Techniques like history-based anomaly detection, adaptive trust scaling, and autoencoders help detect and mitigate attacks such as bad-mouthing or model poisoning by analyzing client behavior and model update patterns. Effective anomaly detection helps maintain model integrity and performance in the presence of adversarial threats .

The main challenges in ensuring robustness in Federated Learning include: adversarial data and model attacks where clients may introduce harmful updates; the presence of noisy updates due to errors or adversarial actions; label-flipped data causing inconsistencies; and vulnerabilities to non-IID data distributions that affect model convergence and performance. Additionally, there is the challenge of designing robust aggregation methods that can handle adversarial and noisy data while maintaining privacy and efficiency .

Implementing robust aggregation strategies in FL models confronts challenges like handling data heterogeneity, malicious updates, and adversarial attacks. Solutions include developing adaptive aggregation methods that dynamically adjust based on client trust metrics, integrating anomaly detection to filter out malicious data, and using multi-party computation for secure aggregation. These strategies aim to enhance both robustness and accuracy while reducing computational overhead. Addressing these challenges ensures reliable model performance across diverse and potentially adversarial environments .

Client selection strategies are significant in achieving robust FL systems as they help mitigate the impact of noisy or malicious client contributions. These strategies use criteria such as model update delays, client reliability, and historical data behaviors to ensure the consistent participation of clients. By strategically choosing clients, these methods enhance the overall performance and security of the global model against adversarial and Byzantine threats. Techniques like dynamic feature extraction and adaptive trust scaling contribute to selecting high-quality and trustworthy clients .

Privacy-preserving techniques enhance Federated Learning security by enabling secure aggregation processes that guard against privacy breaches and data manipulation. Methods like encrypted verification, SAFEFL with MPC, and contrastive clustering help maintain privacy against poisoning attacks and non-IID data adversities . Techniques like Zero-knowledge Proofs and homomorphic encryption also play a crucial role in protecting model updates from Byzantine and poisoning attacks .

Non-IID data distributions present significant challenges to FL models by causing instabilities in convergence and performance degradation. These data inconsistencies lead to biased updates that can skew the global model. To mitigate these issues, advanced aggregation techniques such as Trimmed Mean and coordinate-wise median can reduce the impact of outliers. Adaptive strategies that adjust client weights based on data characteristics can also help achieve more stable convergence. These methods seek to ensure robust model performance despite the inherent disparities in client data distributions .

Current FL approaches address noisy or heterogeneously labeled data by employing strategies like self-paced learning to mitigate label noise, especially in complex domains like medical imaging. Methods such as detecting and excluding noisy federated weights or managing label noise through reliable instances also enhance model robustness. These approaches improve the stability and performance of FL models by effectively dealing with data variability and noise .

Hybrid server-client defense mechanisms improve Federated Learning resilience by addressing vulnerabilities exploitable by sophisticated attackers who may circumvent isolated defenses. These mechanisms integrate server-side approaches like robust aggregation with client-side defenses such as local differential privacy. This dual perspective mitigates attacks that exploit either end, creating a more comprehensive security framework. For example, secure aggregation protocols can work alongside client-side anomaly detection tools, enhancing overall robustness .

To improve federated aggregation methods under highly non-IID data, adaptive weighting schemes that leverage client behavior and metrics like update consistency and loss convergence patterns are proposed. These methods dynamically adjust client contributions for more equitable aggregation. Anomaly detection integrated within aggregation processes can further improve robustness. Efficient algorithms that minimize computational overhead, such as gradient sparsification and dimensionality reduction, also contribute to improved performance .

Cryptographic protocols enhance Federated Learning security by using techniques like Additive Secret Sharing, Oblivious Transfer, and Zero-knowledge Proofs to ensure privacy and protect against Byzantine attacks. These protocols allow secure computations that aggregate client updates without exposing individual data. Homomorphic Encryption is particularly used to defend against model poisoning in non-IID data setups. These protocols maintain confidentiality and integrity of model updates during transmission and aggregation .

You might also like