Optimized tree-classification algorithm for classification of protein sequences

Muhammad Javed Iqbal, Ibrahima Faye, Abas Md Said, Brahim Belhaouari Samir

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Computational intelligence is an ongoing area of research, which has been successfully utilized in the analysis and modeling of the tremendous amount of biological data accumulated under different high throughput genome sequencing projects. The data gathered is mainly comprised of DNA, RNA and protein sequences, which are imprecise, incomplete and increasing exponentially. Classification of protein sequences into different superfamilies could be helpful for knowing the structure/function or hidden characteristics of an unknown protein sequence. The problem of classifying protein sequences based on the primary sequence information is a very complex and challenging task in the analysis and understanding of sequenced data. The existing classification methods are performing well on a very limited data; however the rapid increase in the genomic data leads to the development of improved computational methods. In this work, we have proposed an optimized tree-classification technique which uses cluster k nearest neighbor classification algorithm to classify protein sequences into superfamilies. The proposed technique is alignment free and the experimental results reveal that it outperforms than the previous state-of-the-art approaches. The overall best classification accuracy achieved is 97-98% on the previously utilized dataset, which is taken from the well-known UniProtKB database.

Original languageEnglish
Title of host publication2015 International Symposium on Mathematical Sciences and Computing Research, iSMSC 2015 - Proceedings
EditorsMohammad Nasir Abdullah, Mohamed Imran Bin Mohamed Ariff, Izzatdin Abdul Aziz, Jafreezal Jaafar, Noreen Izza Binti Arshad, Siti Khadijah Nor Abdul Rahim
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages110-115
Number of pages6
ISBN (Electronic)9781479978946
DOIs
Publication statusPublished - 18 Oct 2016
Externally publishedYes
Event2015 International Symposium on Mathematical Sciences and Computing Research, iSMSC 2015 - Ipoh, Malaysia
Duration: 19 May 201520 May 2015

Publication series

Name2015 International Symposium on Mathematical Sciences and Computing Research, iSMSC 2015 - Proceedings

Conference

Conference2015 International Symposium on Mathematical Sciences and Computing Research, iSMSC 2015
Country/TerritoryMalaysia
CityIpoh
Period19/05/1520/05/15

Keywords

  • Feature extraction
  • Genetic Algorithms
  • Protein Classification
  • Superfamily
  • Tree-Classification

Fingerprint

Dive into the research topics of 'Optimized tree-classification algorithm for classification of protein sequences'. Together they form a unique fingerprint.

Cite this