TY - GEN
T1 - Exploring Semantic Hadith Overlap Across Topics
AU - Kurup, Devi G.
AU - Daoud, Amina
AU - Schneider, Jens
AU - Zaghouani, Wajdi
AU - Al Marri, Saeed Mohd H.M.
AU - Al-Absi, Hamada R.H.
AU - Mou, Younss Ait
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Semantic sentence similarity measures the degree of resemblance between multiple sentences. This similarity is a foundational element in information retrieval, machine translation, etc. This paper focuses on natural language processing techniques to analyze the semantic similarity in Hadiths, which are significant religious texts in Islam. Our objective is to investigate the extent of semantiv overlap between Hadiths across various topics, with the aim to provide insights into the cohesion and interconnectedness of Hadiths. We use AraVec and GPT embeddings to represent Hadiths numerically, followed by UMAP (Uniform Manifold Approximation & Projection) to project these embeddings to 2D. The projection serves to visually interpret the relationships between Hadiths, facilitating a deeper understanding of content and semantic interrelations. Our results unveil semantic clusters and connections within Hadiths, contributing to the exploration of Islamic textual heritage through modern computational methodologies. This study suggests that GPT outperforms AraVec, providing a more advanced representation that discerns intricate semantic relationships and subtle nuances within the Hadiths.
AB - Semantic sentence similarity measures the degree of resemblance between multiple sentences. This similarity is a foundational element in information retrieval, machine translation, etc. This paper focuses on natural language processing techniques to analyze the semantic similarity in Hadiths, which are significant religious texts in Islam. Our objective is to investigate the extent of semantiv overlap between Hadiths across various topics, with the aim to provide insights into the cohesion and interconnectedness of Hadiths. We use AraVec and GPT embeddings to represent Hadiths numerically, followed by UMAP (Uniform Manifold Approximation & Projection) to project these embeddings to 2D. The projection serves to visually interpret the relationships between Hadiths, facilitating a deeper understanding of content and semantic interrelations. Our results unveil semantic clusters and connections within Hadiths, contributing to the exploration of Islamic textual heritage through modern computational methodologies. This study suggests that GPT outperforms AraVec, providing a more advanced representation that discerns intricate semantic relationships and subtle nuances within the Hadiths.
KW - Arabic Natural Language Processing
KW - AraVec & GPT
KW - Hadith
KW - Hadith Corpus
KW - Semantic relatedness
KW - UMAP
UR - http://www.scopus.com/inward/record.url?scp=85219201739&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-79164-2_11
DO - 10.1007/978-3-031-79164-2_11
M3 - Conference contribution
AN - SCOPUS:85219201739
SN - 9783031791635
T3 - Communications in Computer and Information Science
SP - 127
EP - 138
BT - Arabic Language Processing
A2 - Hdioud, Boutaina
A2 - Aouragh, Si Lhoussain
PB - Springer Science and Business Media Deutschland GmbH
T2 - 8th International Conference on Arabic Language Processing, ICALP 2023
Y2 - 19 April 2024 through 20 April 2024
ER -