TY - GEN
T1 - DeepCrystal
T2 - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
AU - Elbasir, Abdurrahman
AU - Moovarkumudalvan, Balasubramanian
AU - Kunji, Khalid
AU - Kolatkar, Prasanna R.
AU - Bensmail, Halima
AU - Mall, Raghvendra
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2019/1/21
Y1 - 2019/1/21
N2 - Protein structure determination has primarily been performed using X-ray crystallography. To overcome the expensive cost, high attrition rate and series of trial-and-error settings, many in-silico methods have been developed to predict crystallization propensities of proteins based on their sequences. However, majority of these methods build predictors by extracting features from protein sequences which is computationally expensive and can potentially explode the feature space. We propose, DeepCrystal, a deep learning framework for sequence-based protein crystallization prediction. It uses deep learning to identify proteins which can produce diffraction quality crystals without the need to manually engineer additional biochemical and structural features from sequence. Our model is based on Convolutional Neural Networks (CNNs) which can exploit frequently occurring k-mers and sets of k-mers from the protein sequences to discriminate diffraction quality crystals from non-crystallizable ones. Our model outperforms previous sequence-based protein crystallization predictors in terms of recall, F-score, accuracy and MCC on three independent test sets. DeepCrystal achieves an average improvement of 1.4 %, 12.1% in recall, when compared to its closest competitors, Crysalis II and Crysf respectively. In addition, DeepCrystal attains an average improvement of 2.1%, 6.0% for F-score, 1.9%, 3.9% for accuracy and 3.8%, 7.0% for MCC respectively w.r.t. Crysalis II and Crysf on independent test sets. The standalone source code and models are available at https://github.com/elbasir/DeepCrystal and a web-server is also available at https://deeplearning-protein.qcri.org.
AB - Protein structure determination has primarily been performed using X-ray crystallography. To overcome the expensive cost, high attrition rate and series of trial-and-error settings, many in-silico methods have been developed to predict crystallization propensities of proteins based on their sequences. However, majority of these methods build predictors by extracting features from protein sequences which is computationally expensive and can potentially explode the feature space. We propose, DeepCrystal, a deep learning framework for sequence-based protein crystallization prediction. It uses deep learning to identify proteins which can produce diffraction quality crystals without the need to manually engineer additional biochemical and structural features from sequence. Our model is based on Convolutional Neural Networks (CNNs) which can exploit frequently occurring k-mers and sets of k-mers from the protein sequences to discriminate diffraction quality crystals from non-crystallizable ones. Our model outperforms previous sequence-based protein crystallization predictors in terms of recall, F-score, accuracy and MCC on three independent test sets. DeepCrystal achieves an average improvement of 1.4 %, 12.1% in recall, when compared to its closest competitors, Crysalis II and Crysf respectively. In addition, DeepCrystal attains an average improvement of 2.1%, 6.0% for F-score, 1.9%, 3.9% for accuracy and 3.8%, 7.0% for MCC respectively w.r.t. Crysalis II and Crysf on independent test sets. The standalone source code and models are available at https://github.com/elbasir/DeepCrystal and a web-server is also available at https://deeplearning-protein.qcri.org.
UR - http://www.scopus.com/inward/record.url?scp=85062506633&partnerID=8YFLogxK
U2 - 10.1109/BIBM.2018.8621202
DO - 10.1109/BIBM.2018.8621202
M3 - Conference contribution
AN - SCOPUS:85062506633
T3 - Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
SP - 2747
EP - 2749
BT - Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
A2 - Schmidt, Harald
A2 - Griol, David
A2 - Wang, Haiying
A2 - Baumbach, Jan
A2 - Zheng, Huiru
A2 - Callejas, Zoraida
A2 - Hu, Xiaohua
A2 - Dickerson, Julie
A2 - Zhang, Le
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 3 December 2018 through 6 December 2018
ER -