TY - JOUR
T1 - A Nonparametric Split and Kernel-Merge Clustering Algorithm
AU - Khan, Khurram
AU - ur Rehman, Atiq
AU - Khan, Adnan
AU - Naqvi, Syed Rameez
AU - Belhaouari, Samir Brahim
AU - Bermak, Amine
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2024
Y1 - 2024
N2 - This work proposes a novel split and kernel-merge clustering (S-KMC), a nonparametric clustering algorithm that combines the strengths of hierarchical clustering, partitional clustering, and density-based clustering. It consists of two main phases: splitting and merging. In the splitting phase, a ranking-based operator is used to divide the data into optimal subclusters. In the merging phase, a kernel function estimates the density of these subclusters after projecting them onto a straight line passing through their centers, facilitating the merging operation. S-KMC is fully nonparametric, eliminating the need for prior information about the data. It effectively handles 1) shape diversity, 2) density variability, 3) high dimensionality, 4) outliers, and 5) missing values. The algorithm offers easily tunable hyperparameters, enhancing its applicability to complex problems and robustness against data anomalies. Experimental analysis on 21 benchmark datasets demonstrates the improved performance of S-KMC in terms of cluster accuracy, handling high-dimensional data, and managing data anomalies and outliers. Comprehensive comparisons with state-of-the-art techniques further validate the superior or comparable performance of the proposed S-KMC algorithm.
AB - This work proposes a novel split and kernel-merge clustering (S-KMC), a nonparametric clustering algorithm that combines the strengths of hierarchical clustering, partitional clustering, and density-based clustering. It consists of two main phases: splitting and merging. In the splitting phase, a ranking-based operator is used to divide the data into optimal subclusters. In the merging phase, a kernel function estimates the density of these subclusters after projecting them onto a straight line passing through their centers, facilitating the merging operation. S-KMC is fully nonparametric, eliminating the need for prior information about the data. It effectively handles 1) shape diversity, 2) density variability, 3) high dimensionality, 4) outliers, and 5) missing values. The algorithm offers easily tunable hyperparameters, enhancing its applicability to complex problems and robustness against data anomalies. Experimental analysis on 21 benchmark datasets demonstrates the improved performance of S-KMC in terms of cluster accuracy, handling high-dimensional data, and managing data anomalies and outliers. Comprehensive comparisons with state-of-the-art techniques further validate the superior or comparable performance of the proposed S-KMC algorithm.
KW - Density-based clustering
KW - hierarchical clustering
KW - kernel density estimation
KW - nonparametric approaches
KW - partitional clustering
UR - http://www.scopus.com/inward/record.url?scp=85189181613&partnerID=8YFLogxK
U2 - 10.1109/TAI.2024.3382248
DO - 10.1109/TAI.2024.3382248
M3 - Article
AN - SCOPUS:85189181613
SN - 2691-4581
VL - 5
SP - 4443
EP - 4457
JO - IEEE Transactions on Artificial Intelligence
JF - IEEE Transactions on Artificial Intelligence
IS - 9
ER -