Introduction
Pulmonary nodules (PN) are a common finding in computed tomography (CT) images of the chest, and are identified in millions of patients each year in the United States. Accurate diagnosis of PN is crucial for early detection of cancer and proper treatment. This study aimed to investigate the applicability of machine learning (ML) algorithms in predicting malignant PN.
Methods
A total of 130 patients who underwent tumor resection and were pathologically diagnosed with PN were included in this study. Random Forest (RF), Support Vector Machine (SVM), Classification and Regression Tree (CART), and eXtreme Gradient Boosting (XGBoost) algorithms were employed to predict malignant PN outcomes. The most important features for malignant PN were identified using the RF, CART, and XGBoost algorithms.
Results
80 patients were included in the final analysis, with 62.5% of nodules being malignant and 37.5% being benign. The four algorithms RF, SVM, CART and XGBoost all show great performance, with the AUC reaching 0.97, 0.92, 0.91, and 0.98, respectively. Additionally, the RF algorithm performed the best, with an accuracy of 0.9583, specificity of 0.8889, sensitivity of 1.0000, Kappa of 0.9091, positive predictive value (PPV) of 0.9375, and negative predictive value (NPV) of 1.0000. Besides, age, size, and density were identified as the most important features for predicting malignant PN.
Conclusion
ML algorithms can provide accurate prediction of malignant PN, which could help establish an early auxiliary diagnosis model. This model could facilitate early detection, diagnosis, and treatment of PN, potentially improving the quality of life and reducing mortality rates. However, further studies with larger sample sizes are needed to confirm the findings of this study.