TY - JOUR
T1 - p-clustval
T2 - a novel p-adic approach for enhanced clustering of high-dimensional single-cell RNASeq data
AU - Sharma, Parichit
AU - Mishra, Sarthak
AU - Kurban, Hasan
AU - Dalkilic, Mehmet
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Nature Switzerland AG 2025.
PY - 2025/1/14
Y1 - 2025/1/14
N2 - This paper introduces p-ClustVal, a novel data transformation technique inspired by p-adic number theory that significantly enhances cluster discernibility in genomics data, specifically single-cell RNA sequencing (scRNASeq). By leveraging p-adic-valuation, p-ClustVal integrates with and augments widely used clustering algorithms and dimension reduction techniques, amplifying their effectiveness in discovering meaningful structure from data. The transformation uses a data-centric heuristic to determine optimal parameters, without relying on ground truth labels, making it more user-friendly. p-ClustVal reduces overlap between clusters by employing alternate metric spaces inspired by p-adic-valuation, a significant shift from conventional methods. Our comprehensive evaluation spanning 30 experiments and over 1400 observations shows that p-ClustVal improves performance in 91% of cases and boosts the performance of classical and state-of-the-art (SOTA) methods. This work contributes to data analytics and genomics by introducing a unique data transformation approach, enhancing downstream clustering algorithms, and providing empirical evidence of p-ClustVal’s efficacy. The study concludes with insights into the limitations of p-ClustVal and future research directions.
AB - This paper introduces p-ClustVal, a novel data transformation technique inspired by p-adic number theory that significantly enhances cluster discernibility in genomics data, specifically single-cell RNA sequencing (scRNASeq). By leveraging p-adic-valuation, p-ClustVal integrates with and augments widely used clustering algorithms and dimension reduction techniques, amplifying their effectiveness in discovering meaningful structure from data. The transformation uses a data-centric heuristic to determine optimal parameters, without relying on ground truth labels, making it more user-friendly. p-ClustVal reduces overlap between clusters by employing alternate metric spaces inspired by p-adic-valuation, a significant shift from conventional methods. Our comprehensive evaluation spanning 30 experiments and over 1400 observations shows that p-ClustVal improves performance in 91% of cases and boosts the performance of classical and state-of-the-art (SOTA) methods. This work contributes to data analytics and genomics by introducing a unique data transformation approach, enhancing downstream clustering algorithms, and providing empirical evidence of p-ClustVal’s efficacy. The study concludes with insights into the limitations of p-ClustVal and future research directions.
KW - Clustering high-dimensional data
KW - Data-centric AI
KW - Single-cell RNA sequencing
KW - Unsupervised learning
KW - p-Adic numbers
UR - http://www.scopus.com/inward/record.url?scp=85217229717&partnerID=8YFLogxK
U2 - 10.1007/s41060-024-00709-4
DO - 10.1007/s41060-024-00709-4
M3 - Article
AN - SCOPUS:85217229717
SN - 2364-415X
JO - International Journal of Data Science and Analytics
JF - International Journal of Data Science and Analytics
M1 - 604790
ER -