p-clustval: a novel p-adic approach for enhanced clustering of high-dimensional single-cell RNASeq data

Parichit Sharma*, Sarthak Mishra, Hasan Kurban, Mehmet Dalkilic

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

This paper introduces p-ClustVal, a novel data transformation technique inspired by p-adic number theory that significantly enhances cluster discernibility in genomics data, specifically single-cell RNA sequencing (scRNASeq). By leveraging p-adic-valuation, p-ClustVal integrates with and augments widely used clustering algorithms and dimension reduction techniques, amplifying their effectiveness in discovering meaningful structure from data. The transformation uses a data-centric heuristic to determine optimal parameters, without relying on ground truth labels, making it more user-friendly. p-ClustVal reduces overlap between clusters by employing alternate metric spaces inspired by p-adic-valuation, a significant shift from conventional methods. Our comprehensive evaluation spanning 30 experiments and over 1400 observations shows that p-ClustVal improves performance in 91% of cases and boosts the performance of classical and state-of-the-art (SOTA) methods. This work contributes to data analytics and genomics by introducing a unique data transformation approach, enhancing downstream clustering algorithms, and providing empirical evidence of p-ClustVal’s efficacy. The study concludes with insights into the limitations of p-ClustVal and future research directions.

Original languageEnglish
Article number604790
JournalInternational Journal of Data Science and Analytics
Early online dateJan 2025
DOIs
Publication statusPublished - 14 Jan 2025

Keywords

  • Clustering high-dimensional data
  • Data-centric AI
  • Single-cell RNA sequencing
  • Unsupervised learning
  • p-Adic numbers

Fingerprint

Dive into the research topics of 'p-clustval: a novel p-adic approach for enhanced clustering of high-dimensional single-cell RNASeq data'. Together they form a unique fingerprint.

Cite this