TY - JOUR
T1 - Survey of Multimodal Federated Learning
T2 - Exploring Data Integration, Challenges, and Future Directions
AU - Adam, Mumin
AU - Albaseer, Abdullatif
AU - Baroudi, Uthman
AU - Abdallah, Mohamed
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2025
Y1 - 2025
N2 - The rapidly expanding demand for intelligent wireless applications and the Internet of Things (IoT) requires advanced system designs to handle multimodal data effectively while ensuring user privacy and data security. Traditional machine learning (ML) models rely on centralized architectures, which, while powerful, often present significant privacy risks due to the centralization of sensitive data. Federated Learning (FL) is a promising decentralized alternative for addressing these issues. However, FL predominantly handles unimodal data, which limits its applicability in environments where devices collect and process various data types such as text, images, and sensor output. To address this limitation, Multimodal FL (MMFL) integrates multiple data modalities, enabling a richer and more holistic understanding of data. In this survey, we explore the challenges and advancements in MMFL, including data representation, fusion techniques, and cross-modal learning strategies. We present a comprehensive taxonomy of MMFL, outlining critical challenges such as modality imbalance, fusion complexity, and security concerns. Additionally, we highlight the role of transformers in MMFL by leveraging their powerful attention mechanisms to process multimodal data in a federated setting. Finally, we discuss various applications of MMFL, including healthcare, human activity recognition, and emotion recognition, and propose future research directions for improving the scalability and robustness of MMFL systems in real-world scenarios.
AB - The rapidly expanding demand for intelligent wireless applications and the Internet of Things (IoT) requires advanced system designs to handle multimodal data effectively while ensuring user privacy and data security. Traditional machine learning (ML) models rely on centralized architectures, which, while powerful, often present significant privacy risks due to the centralization of sensitive data. Federated Learning (FL) is a promising decentralized alternative for addressing these issues. However, FL predominantly handles unimodal data, which limits its applicability in environments where devices collect and process various data types such as text, images, and sensor output. To address this limitation, Multimodal FL (MMFL) integrates multiple data modalities, enabling a richer and more holistic understanding of data. In this survey, we explore the challenges and advancements in MMFL, including data representation, fusion techniques, and cross-modal learning strategies. We present a comprehensive taxonomy of MMFL, outlining critical challenges such as modality imbalance, fusion complexity, and security concerns. Additionally, we highlight the role of transformers in MMFL by leveraging their powerful attention mechanisms to process multimodal data in a federated setting. Finally, we discuss various applications of MMFL, including healthcare, human activity recognition, and emotion recognition, and propose future research directions for improving the scalability and robustness of MMFL systems in real-world scenarios.
KW - Accuracy
KW - Computational modeling
KW - Cross-modal
KW - Data fusion
KW - Data models
KW - Data privacy
KW - Distributed databases
KW - Federated learning
KW - Internet of Things
KW - Multimodal FL
KW - Multimodal federated transformer learning
KW - Scalability
KW - Surveys
KW - Transformers
KW - multimodal FL communication intelligent IoT applications
UR - http://www.scopus.com/inward/record.url?scp=105003150636&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/record.url?scp=105001514798&partnerID=8YFLogxK
U2 - 10.1109/OJCOMS.2025.3554537
DO - 10.1109/OJCOMS.2025.3554537
M3 - Article
AN - SCOPUS:105003150636
SN - 2644-125X
VL - 6
SP - 2510
EP - 2538
JO - IEEE Open Journal of the Communications Society
JF - IEEE Open Journal of the Communications Society
ER -