DeepCrystal: A Deep Learning Framework for Sequence-based Protein Crystallization Prediction

Abdurrahman Elbasir, Balasubramanian Moovarkumudalvan, Khalid Kunji, Prasanna R. Kolatkar, Halima Bensmail, Raghvendra Mall

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

Protein structure determination has primarily been performed using X-ray crystallography. To overcome the expensive cost, high attrition rate and series of trial-and-error settings, many in-silico methods have been developed to predict crystallization propensities of proteins based on their sequences. However, majority of these methods build predictors by extracting features from protein sequences which is computationally expensive and can potentially explode the feature space. We propose, DeepCrystal, a deep learning framework for sequence-based protein crystallization prediction. It uses deep learning to identify proteins which can produce diffraction quality crystals without the need to manually engineer additional biochemical and structural features from sequence. Our model is based on Convolutional Neural Networks (CNNs) which can exploit frequently occurring k-mers and sets of k-mers from the protein sequences to discriminate diffraction quality crystals from non-crystallizable ones. Our model outperforms previous sequence-based protein crystallization predictors in terms of recall, F-score, accuracy and MCC on three independent test sets. DeepCrystal achieves an average improvement of 1.4 %, 12.1% in recall, when compared to its closest competitors, Crysalis II and Crysf respectively. In addition, DeepCrystal attains an average improvement of 2.1%, 6.0% for F-score, 1.9%, 3.9% for accuracy and 3.8%, 7.0% for MCC respectively w.r.t. Crysalis II and Crysf on independent test sets. The standalone source code and models are available at https://github.com/elbasir/DeepCrystal and a web-server is also available at https://deeplearning-protein.qcri.org.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
EditorsHarald Schmidt, David Griol, Haiying Wang, Jan Baumbach, Huiru Zheng, Zoraida Callejas, Xiaohua Hu, Julie Dickerson, Le Zhang
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2747-2749
Number of pages3
ISBN (Electronic)9781538654880
DOIs
Publication statusPublished - 21 Jan 2019
Event2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018 - Madrid, Spain
Duration: 3 Dec 20186 Dec 2018

Publication series

NameProceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018

Conference

Conference2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
Country/TerritorySpain
CityMadrid
Period3/12/186/12/18

Fingerprint

Dive into the research topics of 'DeepCrystal: A Deep Learning Framework for Sequence-based Protein Crystallization Prediction'. Together they form a unique fingerprint.

Cite this