TY - GEN
T1 - Software clustering using automated feature subset selection
AU - Shah, Zubair
AU - Naseem, Rashid
AU - Orgun, Mehmet A.
AU - Mahmood, Abdun
AU - Shahzad, Sara
PY - 2013
Y1 - 2013
N2 - This paper proposes a feature selection technique for software clustering which can be used in the architecture recovery of software systems. The recovered architecture can then be used in the subsequent phases of software maintenance, reuse and re-engineering. A number of diverse features could be extracted from the source code of software systems, however, some of the extracted features may have less information to use for calculating the entities, which result in dropping the quality of software clusters. Therefore, further research is required to select those features which have high relevancy in finding associations between entities. In this article first we propose a supervised feature selection technique for unlabeled data, and then we apply this technique for software clustering. A number of feature subset selection techniques in software architecture recovery have been proposed. However none of them focus on automated feature selection in this domain. Experimental results on three software test systems reveal that our proposed approach produces results which are closer to the decompositions prepared by human experts, as compared to those discovered by the well-known K-Means algorithm.
AB - This paper proposes a feature selection technique for software clustering which can be used in the architecture recovery of software systems. The recovered architecture can then be used in the subsequent phases of software maintenance, reuse and re-engineering. A number of diverse features could be extracted from the source code of software systems, however, some of the extracted features may have less information to use for calculating the entities, which result in dropping the quality of software clusters. Therefore, further research is required to select those features which have high relevancy in finding associations between entities. In this article first we propose a supervised feature selection technique for unlabeled data, and then we apply this technique for software clustering. A number of feature subset selection techniques in software architecture recovery have been proposed. However none of them focus on automated feature selection in this domain. Experimental results on three software test systems reveal that our proposed approach produces results which are closer to the decompositions prepared by human experts, as compared to those discovered by the well-known K-Means algorithm.
KW - Feature Selection
KW - K-Means
KW - Software Clustering
UR - http://www.scopus.com/inward/record.url?scp=84893124450&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-53917-6_5
DO - 10.1007/978-3-642-53917-6_5
M3 - Conference contribution
AN - SCOPUS:84893124450
SN - 9783642539169
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 47
EP - 58
BT - Advanced Data Mining and Applications - 9th International Conference, ADMA 2013, Proceedings
T2 - 9th International Conference on Advanced Data Mining and Applications, ADMA 2013
Y2 - 14 December 2013 through 16 December 2013
ER -