TY - GEN
T1 - Optimized tree-classification algorithm for classification of protein sequences
AU - Iqbal, Muhammad Javed
AU - Faye, Ibrahima
AU - Said, Abas Md
AU - Belhaouari Samir, Brahim
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2016/10/18
Y1 - 2016/10/18
N2 - Computational intelligence is an ongoing area of research, which has been successfully utilized in the analysis and modeling of the tremendous amount of biological data accumulated under different high throughput genome sequencing projects. The data gathered is mainly comprised of DNA, RNA and protein sequences, which are imprecise, incomplete and increasing exponentially. Classification of protein sequences into different superfamilies could be helpful for knowing the structure/function or hidden characteristics of an unknown protein sequence. The problem of classifying protein sequences based on the primary sequence information is a very complex and challenging task in the analysis and understanding of sequenced data. The existing classification methods are performing well on a very limited data; however the rapid increase in the genomic data leads to the development of improved computational methods. In this work, we have proposed an optimized tree-classification technique which uses cluster k nearest neighbor classification algorithm to classify protein sequences into superfamilies. The proposed technique is alignment free and the experimental results reveal that it outperforms than the previous state-of-the-art approaches. The overall best classification accuracy achieved is 97-98% on the previously utilized dataset, which is taken from the well-known UniProtKB database.
AB - Computational intelligence is an ongoing area of research, which has been successfully utilized in the analysis and modeling of the tremendous amount of biological data accumulated under different high throughput genome sequencing projects. The data gathered is mainly comprised of DNA, RNA and protein sequences, which are imprecise, incomplete and increasing exponentially. Classification of protein sequences into different superfamilies could be helpful for knowing the structure/function or hidden characteristics of an unknown protein sequence. The problem of classifying protein sequences based on the primary sequence information is a very complex and challenging task in the analysis and understanding of sequenced data. The existing classification methods are performing well on a very limited data; however the rapid increase in the genomic data leads to the development of improved computational methods. In this work, we have proposed an optimized tree-classification technique which uses cluster k nearest neighbor classification algorithm to classify protein sequences into superfamilies. The proposed technique is alignment free and the experimental results reveal that it outperforms than the previous state-of-the-art approaches. The overall best classification accuracy achieved is 97-98% on the previously utilized dataset, which is taken from the well-known UniProtKB database.
KW - Feature extraction
KW - Genetic Algorithms
KW - Protein Classification
KW - Superfamily
KW - Tree-Classification
UR - http://www.scopus.com/inward/record.url?scp=84995663140&partnerID=8YFLogxK
U2 - 10.1109/ISMSC.2015.7594037
DO - 10.1109/ISMSC.2015.7594037
M3 - Conference contribution
AN - SCOPUS:84995663140
T3 - 2015 International Symposium on Mathematical Sciences and Computing Research, iSMSC 2015 - Proceedings
SP - 110
EP - 115
BT - 2015 International Symposium on Mathematical Sciences and Computing Research, iSMSC 2015 - Proceedings
A2 - Abdullah, Mohammad Nasir
A2 - Ariff, Mohamed Imran Bin Mohamed
A2 - Aziz, Izzatdin Abdul
A2 - Jaafar, Jafreezal
A2 - Arshad, Noreen Izza Binti
A2 - Rahim, Siti Khadijah Nor Abdul
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2015 International Symposium on Mathematical Sciences and Computing Research, iSMSC 2015
Y2 - 19 May 2015 through 20 May 2015
ER -