A distance-based feature-encoding technique for protein sequence classification in bioinformatics

Muhammad Javed Iqbal, Ibrahima Faye, Abas Md Said, Brahim Belhaouari Samir

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Citations (Scopus)

Abstract

Bioinformatics has been emerging as a new research dimension since the last century by combining computer science and biology techniques for the automatic analysis of biological sequence data. The volume of the biological data gathered under different sequencing projects is increasing exponentially. These sequences contain extremely important information about genes, their structure and function. Computational techniques which involve machine learning and pattern recognition are becoming very useful on Bioinformatics data like DNA and protein. Protein classification into different groups could be used for knowing the structure or the function of unknown protein sequence. The process of classifying protein amino acid sequences into a family/superfamily is a very complex problem. However, from among other major issues in a protein classification, the critical one is an accurate representation of amino acid sequence during the feature extraction. In this work, we have proposed a distance-based feature-encoding method; the proposed technique has been tested with different classifiers, which have shown better results than the previously available techniques for superfamily classification of protein sequences. The maximum average classification accuracy obtained was 91.2%. The dataset used in the experiments was taken from the well known UniProtKB protein database.

Original languageEnglish
Title of host publicationProceeding - IEEE CYBERNETICSCOM 2013
Subtitle of host publicationIEEE International Conference on Computational Intelligence and Cybernetics
PublisherIEEE Computer Society
Pages1-5
Number of pages5
ISBN (Print)9781467360531
DOIs
Publication statusPublished - 2013
Externally publishedYes
Event2nd IEEE International Conference on Computational Intelligence and Cybernetics, IEEE CYBERNETICSCOM 2013 - Yogyakarta, Indonesia
Duration: 3 Dec 20134 Dec 2013

Publication series

NameProceeding - IEEE CYBERNETICSCOM 2013: IEEE International Conference on Computational Intelligence and Cybernetics

Conference

Conference2nd IEEE International Conference on Computational Intelligence and Cybernetics, IEEE CYBERNETICSCOM 2013
Country/TerritoryIndonesia
CityYogyakarta
Period3/12/134/12/13

Keywords

  • Bioinformatics
  • Data mining
  • Feature-encoding
  • Protein classification
  • Superfamily

Fingerprint

Dive into the research topics of 'A distance-based feature-encoding technique for protein sequence classification in bioinformatics'. Together they form a unique fingerprint.

Cite this