TY - GEN
T1 - p-ClustVal
T2 - 11th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2024
AU - Sharma, Parichit
AU - Mishra, Sarthak
AU - Kurban, Hasan
AU - Dalkilic, Mehmet
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024/10/10
Y1 - 2024/10/10
N2 - This paper introduces p-ClustVal, a novel data transformation technique inspired by p-adic number theory that significantly enhances cluster discernibility in genomics data, specifically Single Cell RNA Sequencing (scRNASeq). By lever-aging p-adic-valuation, p-ClustVal integrates with and augments widely used clustering algorithms and dimension reduction techniques, amplifying their effectiveness in discovering meaningful structure from data. The transformation uses a data-centric heuristic to determine optimal parameters, without relying on ground truth labels, making it more user-friendly. p-ClustVal reduces overlap between clusters by employing alternate metric spaces inspired by p-adic-valuation, a significant shift from conventional methods. Our comprehensive evaluation spanning 30 experiments and over 1200 observations, shows that p-ClustVal improves performance in 91% of cases, and boosts the performance of classical and state of the art (SOTA) methods. This work contributes to data analytics and genomics by introducing a unique data transformation approach, enhancing downstream clustering algorithms, and providing empirical evidence of p-ClustVal's efficacy.
AB - This paper introduces p-ClustVal, a novel data transformation technique inspired by p-adic number theory that significantly enhances cluster discernibility in genomics data, specifically Single Cell RNA Sequencing (scRNASeq). By lever-aging p-adic-valuation, p-ClustVal integrates with and augments widely used clustering algorithms and dimension reduction techniques, amplifying their effectiveness in discovering meaningful structure from data. The transformation uses a data-centric heuristic to determine optimal parameters, without relying on ground truth labels, making it more user-friendly. p-ClustVal reduces overlap between clusters by employing alternate metric spaces inspired by p-adic-valuation, a significant shift from conventional methods. Our comprehensive evaluation spanning 30 experiments and over 1200 observations, shows that p-ClustVal improves performance in 91% of cases, and boosts the performance of classical and state of the art (SOTA) methods. This work contributes to data analytics and genomics by introducing a unique data transformation approach, enhancing downstream clustering algorithms, and providing empirical evidence of p-ClustVal's efficacy.
KW - Data-Centric AI
KW - Single Cell RNA Sequencing
KW - Unsupervised Learning
KW - p-Adic Numbers
UR - http://www.scopus.com/inward/record.url?scp=85209390830&partnerID=8YFLogxK
U2 - 10.1109/DSAA61799.2024.10722799
DO - 10.1109/DSAA61799.2024.10722799
M3 - Conference contribution
AN - SCOPUS:85209390830
T3 - 2024 IEEE 11th International Conference on Data Science and Advanced Analytics, DSAA 2024
BT - 2024 IEEE 11th International Conference on Data Science and Advanced Analytics, DSAA 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 6 October 2024 through 10 October 2024
ER -