BackgroundTo construct prognostic model of colorectal cancer (CRC) recurrence and metastasis (R&M) with traditional Chinese medicine (TCM) factors based on different machine learning (ML) methods. Aiming to offset the defects in the existing model lacking TCM factors.MethodsPatients with stage I-III CRC after radical resection were included as the model data set. The training set and the internal verification set were randomly divided at a ratio of 7: 3 by the “set aside method”. The average performance index and 95% confidence interval of the model were calculated by repeating 100 tests. Eight factors were used as predictors of Western medicine. Two types of models were constructed by taking “whether to accept TCM intervention” and “different TCM syndrome types” as TCM predictors. The model was constructed by four ML methods: logistic regression, random forest, Extreme Gradient Boosting (XGBoost) and support vector machine (SVM). The predicted target was whether R&M would occur within 3 years and 5 years after radical surgery. The area under curve (AUC) value and decision curve analysis (DCA) curve were used to evaluate accuracy and utility of the model.ResultsThe model data set consisted of 558 patients, of which 317 received TCM intervention after radical resection. The model based on the four ML methods with the TCM factor of “whether to accept TCM intervention” showed good ability in predicting R&M within 3 years and 5 years (AUC value > 0.75), and XGBoost was the best method. The DCA indicated that when the R&M probability in patients was at a certain threshold, the models provided additional clinical benefits. When predicting the R&M probability within 3 years and 5 years in the model with TCM factors of “different TCM syndrome types”, the four methods all showed certain predictive ability (AUC value > 0.70). With the exception of the model constructed by SVM, the other methods provided additional clinical benefits within a certain probability threshold.ConclusionThe prognostic model based on ML methods shows good accuracy and clinical utility. It can quantify the influence degree of TCM factors on R&M, and provide certain values for clinical decision-making.
Purpose: The present study aimed to develop prognostic prediction models based on machine learning (ML) for non-metastatic colon cancer (CRC), which can provide a precise quantitative risk assessment and serve as an assistive method for treatment strategy development. The possibility of improving prediction accuracy using nonlinear methods compared to linear methods was investigated. Patients and Methods: A cancer-specific survival (CSS) model constructed using logistic regression, extreme gradient boosting (XGBoost), and random forest algorithms was trained on the Surveillance, Epidemiology, and End Results datasets for 15,254 patients with nonmetastatic CRC (split into training [70%] and internal validation [30%] datasets) and externally validated with an outpatient cohort of 311 cases from Xiyuan Hospital in China. A Chinese cohort was also used to develop recurrence and metastasis (R&M) models for CRC patients. The experiments for each model were performed 100 times to obtain average scores and 95% confidence intervals. The model performance was evaluated using the area under the receiver operating characteristic curve (AUC) values. Results: The XGBoost approach showed the highest AUC values of 0.86 (0.84-0.88), 0.82 (0.81-0.83), and 0.81 (0.79-0.82) for one-, three-, and five-year CSS cohorts, respectively, along with a relatively high generalization ability. The XGBoost approach also performed best for the R&M model, with the AUC values of 0.71 (0.64-0.79), 0.79 (0.74-0.86), and 0.89 (0.82-0.95) for one-, three-, and five-year R&M cohorts, respectively. The rankings of predictor importance for the CSS and R&M models were different, and the higher model accuracy was associated with more prognostic predictors. Conclusion: Three different ML algorithms for developing prognostic prediction models for non-metastatic CRC were compared. The predictive performance results showed that the nonlinear XGBoost approach performed best, suggesting that it can be used for quantifying the prognostic risk. It was also demonstrated that the model performance can be improved when more prognostic predictors are considered.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.