TY - GEN
T1 - A hybrid approach to improving clustering accuracy using SVM
AU - Shah, Zubair
AU - Mahmood, Abdun Naser
AU - Mustafa, Abdul K.
PY - 2013
Y1 - 2013
N2 - Support Vector Machines (SVMs) have been used in many areas such as regression, classification and novelity detection due to its accuracy and generalizability. Recently SVMs have been proposed for clustering analysis as well. Support Vector Clustering (SVC) works by finding the minimum enclosing sphere of data points using SVM training. SVC is a boundary based clustering method, where the support information is used to construct cluster boundaries. In support vector-based clustering algorithms, the main computational bottle-neck is the high cluster labeling time for each data point. In addition, in many cases labeled data is not available for use with SVC. This tends to restrict the scalability of the method and results in decreased efficiency. This also decreases the applicability of the SVC method to real-life datasets most of which do not have any class labels. In this paper we present a technique that could be used to utilize SVM to improve the accuracy of clustering without the need of labeled dataset. We have used K-Means clustering algorithm to generate initial labels from the data and in the next step we have trained a Sequential Minimal Optimization (SMO) classifier on these labels. The original data set is then tested using the trained SMO classifier to improve classification accuracy. This process is continued iteratively and stops when further improvement is not possible. The proposed approach is compared against the popular Stephen winters-Hilt [1] approach and achieves 94% accuracy when applied to benchmark datasets.
AB - Support Vector Machines (SVMs) have been used in many areas such as regression, classification and novelity detection due to its accuracy and generalizability. Recently SVMs have been proposed for clustering analysis as well. Support Vector Clustering (SVC) works by finding the minimum enclosing sphere of data points using SVM training. SVC is a boundary based clustering method, where the support information is used to construct cluster boundaries. In support vector-based clustering algorithms, the main computational bottle-neck is the high cluster labeling time for each data point. In addition, in many cases labeled data is not available for use with SVC. This tends to restrict the scalability of the method and results in decreased efficiency. This also decreases the applicability of the SVC method to real-life datasets most of which do not have any class labels. In this paper we present a technique that could be used to utilize SVM to improve the accuracy of clustering without the need of labeled dataset. We have used K-Means clustering algorithm to generate initial labels from the data and in the next step we have trained a Sequential Minimal Optimization (SMO) classifier on these labels. The original data set is then tested using the trained SMO classifier to improve classification accuracy. This process is continued iteratively and stops when further improvement is not possible. The proposed approach is compared against the popular Stephen winters-Hilt [1] approach and achieves 94% accuracy when applied to benchmark datasets.
KW - K-Means
KW - Labeling data
KW - SVM
UR - http://www.scopus.com/inward/record.url?scp=84881467757&partnerID=8YFLogxK
U2 - 10.1109/ICIEA.2013.6566473
DO - 10.1109/ICIEA.2013.6566473
M3 - Conference contribution
AN - SCOPUS:84881467757
SN - 9781467363211
T3 - Proceedings of the 2013 IEEE 8th Conference on Industrial Electronics and Applications, ICIEA 2013
SP - 783
EP - 788
BT - Proceedings of the 2013 IEEE 8th Conference on Industrial Electronics and Applications, ICIEA 2013
T2 - 2013 IEEE 8th Conference on Industrial Electronics and Applications, ICIEA 2013
Y2 - 19 June 2013 through 21 June 2013
ER -