Software clustering using automated feature subset selection

Zubair Shah, Rashid Naseem, Mehmet A. Orgun, Abdun Mahmood, Sara Shahzad

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Citations (Scopus)

Abstract

This paper proposes a feature selection technique for software clustering which can be used in the architecture recovery of software systems. The recovered architecture can then be used in the subsequent phases of software maintenance, reuse and re-engineering. A number of diverse features could be extracted from the source code of software systems, however, some of the extracted features may have less information to use for calculating the entities, which result in dropping the quality of software clusters. Therefore, further research is required to select those features which have high relevancy in finding associations between entities. In this article first we propose a supervised feature selection technique for unlabeled data, and then we apply this technique for software clustering. A number of feature subset selection techniques in software architecture recovery have been proposed. However none of them focus on automated feature selection in this domain. Experimental results on three software test systems reveal that our proposed approach produces results which are closer to the decompositions prepared by human experts, as compared to those discovered by the well-known K-Means algorithm.

Original languageEnglish
Title of host publicationAdvanced Data Mining and Applications - 9th International Conference, ADMA 2013, Proceedings
Pages47-58
Number of pages12
EditionPART 2
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event9th International Conference on Advanced Data Mining and Applications, ADMA 2013 - Hangzhou, China
Duration: 14 Dec 201316 Dec 2013

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 2
Volume8347 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference9th International Conference on Advanced Data Mining and Applications, ADMA 2013
Country/TerritoryChina
CityHangzhou
Period14/12/1316/12/13

Keywords

  • Feature Selection
  • K-Means
  • Software Clustering

Fingerprint

Dive into the research topics of 'Software clustering using automated feature subset selection'. Together they form a unique fingerprint.

Cite this