TY - JOUR
T1 - A modeling framework for embedding-based predictions for compound-viral protein activity
AU - Mall, Raghvendra
AU - Elbasir, Abdurrahman
AU - Almeer, Hossam
AU - Islam, Zeyaul
AU - Kolatkar, Prasanna R.
AU - Chawla, Sanjay
AU - Ullah, Ehsan
N1 - Publisher Copyright:
© 2021 The Author(s). Published by Oxford University Press. All rights reserved.
PY - 2021/9/1
Y1 - 2021/9/1
N2 - Motivation: A global effort is underway to identify compounds for the treatment of COVID-19. Since de novo compound design is an extremely long, time-consuming and expensive process, efforts are underway to discover existing compounds that can be repurposed for COVID-19 and new viral diseases. We propose a machine learning representation framework that uses deep learning induced vector embeddings of compounds and viral proteins as features to predict compound-viral protein activity. The prediction model in-turn uses a consensus framework to rank approved compounds against viral proteins of interest. Results: Our consensus framework achieves a high mean Pearson correlation of 0.916, mean R2 of 0.840 and a low mean squared error of 0.313 for the task of compound-viral protein activity prediction on an independent test set. As a use case, we identify a ranked list of 47 compounds common to three main proteins of SARS-COV-2 virus (PL-PRO, 3CL-PRO and Spike protein) as potential targets including 21 antivirals, 15 anticancer, 5 antibiotics and 6 other investigational human compounds. We perform additional molecular docking simulations to demonstrate that majority of these compounds have low binding energies and thus high binding affinity with the potential to be effective against the SARS-COV-2 virus.
AB - Motivation: A global effort is underway to identify compounds for the treatment of COVID-19. Since de novo compound design is an extremely long, time-consuming and expensive process, efforts are underway to discover existing compounds that can be repurposed for COVID-19 and new viral diseases. We propose a machine learning representation framework that uses deep learning induced vector embeddings of compounds and viral proteins as features to predict compound-viral protein activity. The prediction model in-turn uses a consensus framework to rank approved compounds against viral proteins of interest. Results: Our consensus framework achieves a high mean Pearson correlation of 0.916, mean R2 of 0.840 and a low mean squared error of 0.313 for the task of compound-viral protein activity prediction on an independent test set. As a use case, we identify a ranked list of 47 compounds common to three main proteins of SARS-COV-2 virus (PL-PRO, 3CL-PRO and Spike protein) as potential targets including 21 antivirals, 15 anticancer, 5 antibiotics and 6 other investigational human compounds. We perform additional molecular docking simulations to demonstrate that majority of these compounds have low binding energies and thus high binding affinity with the potential to be effective against the SARS-COV-2 virus.
UR - http://www.scopus.com/inward/record.url?scp=85105786579&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btab130
DO - 10.1093/bioinformatics/btab130
M3 - Article
C2 - 33638345
AN - SCOPUS:85105786579
SN - 1367-4803
VL - 37
SP - 2544
EP - 2555
JO - Bioinformatics
JF - Bioinformatics
IS - 17
ER -