Novel Biomarker Prediction for Lung Cancer Using Random Forest Classifiers

Lavanya, C; Pooja, S; Kashyap, Abhay H; Rahaman, Abdur; Niranjan, Swarna; Niranjan, Vidya

doi:10.1177/11769351231167992

Cited by 6 publications

(1 citation statement)

References 64 publications

(76 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A significant feature of our study is the unique integration of Lasso and RF methods, which resulted in remarkable predictive performance. The feature selection method of Lasso ( Ghosh and Chinnaiyan, 2005 ; Tsagris et al, 2018 ) and RF ( C et al, 2023 ; Toth et al, 2019 ) has become a prevalent approach in biology for more effectively identifying essential biomarkers. Until now, no research has created a CKD prediction model utilizing gene sequencing, particularly due to the scarcity of kidney tissue samples from CKD patients, which are challenging to obtain.…”

Section: Discussionmentioning

confidence: 99%

Development and evaluation of a chronic kidney disease risk prediction model using random forest

Mendapara

2024

Front. Genet.

View full text Add to dashboard Cite

This research aims to advance the detection of Chronic Kidney Disease (CKD) through a novel gene-based predictive model, leveraging recent breakthroughs in gene sequencing. We sourced and merged gene expression profiles of CKD-affected renal tissues from the Gene Expression Omnibus (GEO) database, classifying them into two sets for training and validation in a 7:3 ratio. The training set included 141 CKD and 33 non-CKD specimens, while the validation set had 60 and 14, respectively. The disease risk prediction model was constructed using the training dataset, while the validation dataset confirmed the model’s identification capabilities. The development of our predictive model began with evaluating differentially expressed genes (DEGs) between the two groups. We isolated six genes using Lasso and random forest (RF) methods—DUSP1, GADD45B, IFI44L, IFI30, ATF3, and LYZ—which are critical in differentiating CKD from non-CKD tissues. We refined our random forest (RF) model through 10-fold cross-validation, repeated five times, to optimize the mtry parameter. The performance of our model was robust, with an average AUC of 0.979 across the folds, translating to a 91.18% accuracy. Validation tests further confirmed its efficacy, with a 94.59% accuracy and an AUC of 0.990. External validation using dataset GSE180394 yielded an AUC of 0.913, 89.83% accuracy, and a sensitivity rate of 0.889, underscoring the model’s reliability. In summary, the study identified critical genetic biomarkers and successfully developed a novel disease risk prediction model for CKD. This model can serve as a valuable tool for CKD disease risk assessment and contribute significantly to CKD identification.

show abstract

Section: Discussionmentioning

confidence: 99%

Development and evaluation of a chronic kidney disease risk prediction model using random forest

Mendapara

2024

Front. Genet.

View full text Add to dashboard Cite

show abstract

Integrated analysis of diverse cancer types reveals a breast cancer-specific serum miRNA biomarker through relative expression orderings analysis

Ma,

Gao,

Huo

et al. 2024

Breast Cancer Res Treat

View full text Add to dashboard Cite

Purpose Serum microRNA (miRNA) holds great potential as a non-invasive biomarker for diagnosing breast cancer (BrC). However, most diagnostic models rely on the absolute expression levels of miRNAs, which are susceptible to batch effects and challenging for clinical transformation. Furthermore, current studies on liquid biopsy diagnostic biomarkers for BrC mainly focus on distinguishing BrC patients from healthy controls, needing more specificity assessment. Methods We collected a large number of miRNA expression data involving 8465 samples from GEO, including 13 different cancer types and non-cancer controls. Based on the relative expression orderings (REOs) of miRNAs within each sample, we applied the greedy, LASSO multiple linear regression, and random forest algorithms to identify a qualitative biomarker specific to BrC by comparing BrC samples to samples of other cancers as controls. Results We developed a BrC-specific biomarker called 7-miRPairs, consisting of seven miRNA pairs. It demonstrated comparable classification performance in our analyzed machine learning algorithms while requiring fewer miRNA pairs, accurately distinguishing BrC from 12 other cancer types. The diagnostic performance of 7-miRPairs was favorable in the training set (accuracy = 98.47%, specificity = 98.14%, sensitivity = 99.25%), and similar results were obtained in the test set (accuracy = 97.22%, specificity = 96.87%, sensitivity = 98.02%). KEGG pathway enrichment analysis of the 11 miRNAs within the 7-miRPairs revealed significant enrichment of target mRNAs in pathways associated with BrC. Conclusion Our study provides evidence that utilizing serum miRNA pairs can offer significant advantages for BrC-specific diagnosis in clinical practice by directly comparing serum samples with BrC to other cancer types.

show abstract

Groundwater quality assessment using machine learning models: a comprehensive study on the industrial corridor of a semi-arid region

Krishnamoorthy,

Lakshmanan

2024

Environ Sci Pollut Res

View full text Add to dashboard Cite

Novel Biomarker Prediction for Lung Cancer Using Random Forest Classifiers

Cited by 6 publications

References 64 publications

Development and evaluation of a chronic kidney disease risk prediction model using random forest

Development and evaluation of a chronic kidney disease risk prediction model using random forest

Integrated analysis of diverse cancer types reveals a breast cancer-specific serum miRNA biomarker through relative expression orderings analysis

Groundwater quality assessment using machine learning models: a comprehensive study on the industrial corridor of a semi-arid region

Contact Info

Product

Resources

About