Breast cancer patients at the same stage may show different clinical prognoses or different therapeutic effects of systemic therapy. Differentially expressed genes of breast cancer were identified from GSE42568. Through survival, receiver operating characteristic (ROC) curve, random forest, GSVA and a Cox regression model analyses, genes were identified that could be associated with survival time in breast cancer. The molecular mechanism was identified by enrichment, GSEA, methylation and SNV analyses. Then, the expression of a key gene was verified by the TCGA dataset and RT-qPCR, Western blot, and immunohistochemistry. We identified 784 genes related to the 5-year overall survival time of breast cancer. Through ROC curve and random forest analysis, 10 prognostic genes were screened. These were integrated into a complex by GSVA, and high expression of the complex significantly promoted the recurrence-free survival of patients. In addition, key genes were related to immune and metabolic-related functions. Importantly, we identified methylation of MEX3A and TBC1D 9 and mutations events. Finally, the expression of UGCG was verified by the TCGA dataset and by experimental methods in our own samples. These results indicate that 10 genes may be potential biomarkers and therapeutic targets for long-term survival in breast cancer, especially UGCG.
Open Payments is a public database containing information reported by companies about payments made to US-licensed physicians (Open Payments).No potential conflicts of interest were reported.
Background and purpose: Machine learning (ML) is applied for outcome prediction and treatment support. This study aims to develop different ML models to predict risk of axillary lymph node metastasis (LNM) in breast invasive micropapillary carcinoma (IMPC) and to explore the risk factors of LNM.Methods: From the Surveillance, Epidemiology, and End Results (SEER) database and the records of our hospital, a total of 1547 patients diagnosed with breast IMPC were incorporated in this study. The ML model is built and the external validation is carried out. SHapley Additive exPlanations (SHAP) framework was applied to explain the optimal model; multivariable analysis was performed with logistic regression (LR); and nomograms were constructed according to the results of LR analysis.Results: Age and tumor size were correlated with LNM in both cohorts. The luminal subtype is the most common in patients, with the tumor size <=20mm. Compared to other models, Xgboost was the best ML model with the biggest AUC of 0.813 (95% CI: 0.7994 -0.8262) and the smallest Brier score of 0.186 (95% CI: 0.799-0.826). SHAP plots demonstrated that tumor size was the most vital risk factor for LNM. In both training and test sets, Xgboost had better AUC (0.761 vs 0.745; 0.813 vs 0.775; respectively), and it also achieved a smaller Brier score (0.202 vs 0.204; 0.186 vs 0.191; 0.220 vs 0.221; respectively) than the nomogram model based on LR in those three different sets. After adjusting for five most influential variables (tumor size, age, ER, HER-2, and PR), prediction score based on the Xgboost model was still correlated with LNM (adjusted OR:2.73, 95% CI: 1.30-5.71, P=0.008).
Conclusions: The Xgboost model outperforms the traditional LR-based nomogram model in predicting the LNM of IMPC patients. Combined with SHAP, it can more intuitively reflect the influence of different variables on the LNM. The tumor size was the most important risk factor of LNM for breast IMPC Frontiers in Oncology frontiersin.org 01
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.