TY - GEN
T1 - Machine Learning Model for the Identification of Lung Cancer Subtypes based on DNA Methylation
AU - Al-Qirshi, Raghad
AU - Basit, Syed Abdullah
AU - Musleh, Saleh
AU - Islam, Mohammad Tariqul
AU - Alam, Tanvir
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2025/3/10
Y1 - 2025/3/10
N2 - Lung Adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) are the two main histology subtypes of non-small cell lung cancer (NSCLC) with 70% of total Lung Cancer. In this article we proposed an ensemble-based model for the identification of subtypes of NSCLC using methylation data. Proposed Random Forest-based model along with out of bag (OOB) error based feature selection technique identified the top ten most important CpG sites that are highly differentiator between LUSC and LUAD subtypes of NSCLC with an accuracy, precision and F1 Score of. The proposed model outperformed the other existing models for the same purpose with huge margin of 12%. Pathway analysis of the proposed 10 CpG sites revealed different pathways for LUAD and LUSC associated genes, LUAD-associated genes primarily participated in TP53, PTEN, GLP-1, Incretin regulation, and apoptosis. Conversely, LUSC-associated genes were predominantly involved in pathways for platelet degranulation, serine biosynthesis, and Nephrin family interaction.
AB - Lung Adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) are the two main histology subtypes of non-small cell lung cancer (NSCLC) with 70% of total Lung Cancer. In this article we proposed an ensemble-based model for the identification of subtypes of NSCLC using methylation data. Proposed Random Forest-based model along with out of bag (OOB) error based feature selection technique identified the top ten most important CpG sites that are highly differentiator between LUSC and LUAD subtypes of NSCLC with an accuracy, precision and F1 Score of. The proposed model outperformed the other existing models for the same purpose with huge margin of 12%. Pathway analysis of the proposed 10 CpG sites revealed different pathways for LUAD and LUSC associated genes, LUAD-associated genes primarily participated in TP53, PTEN, GLP-1, Incretin regulation, and apoptosis. Conversely, LUSC-associated genes were predominantly involved in pathways for platelet degranulation, serine biosynthesis, and Nephrin family interaction.
KW - LUAD
KW - LUSC
KW - Lung Cancer
KW - Machine Learning
UR - http://www.scopus.com/inward/record.url?scp=105002293121&partnerID=8YFLogxK
U2 - 10.1145/3704239.3704242
DO - 10.1145/3704239.3704242
M3 - Conference contribution
AN - SCOPUS:105002293121
T3 - ICHSM 2024 - 2024 7th International Conference on Healthcare Service Management
SP - 52
EP - 56
BT - ICHSM 2024 - 2024 7th International Conference on Healthcare Service Management
PB - Association for Computing Machinery, Inc
T2 - 2024 7th International Conference on Healthcare Service Management, ICHSM 2024
Y2 - 6 September 2024 through 8 September 2024
ER -