Background: Patients with pulmonary embolism (PE) who prematurely discontinue anticoagulant therapy (<90 days) are at an increased risk for death or recurrences.
Methods: We used the data from the RIETE registry to compare the prognostic ability of 5 machine-learning (ML) models and logistic regression to identify patients at increased risk for the composite of fatal PE or recurrent venous thromboembolism (VTE) 30 days after discontinuation. ML models included Decision tree, K-Nearest Neighbors algorithm, Support Vector Machine, Ensemble and Neural Network [NN]. A “full” model with 70 variables and a “reduced” model with 23 were analyzed. Model performance was assessed by confusion matrix metrics on the testing data for each model and a calibration plot.
Results: Among 34,447 patients with PE, 1,348 (3.9%) discontinued therapy prematurely. Fifty-one (3.8%) developed fatal PE or sudden death and 24 (1.8%) had non-fatal VTE recurrences within 30 days after discontinuation. ML-NN was the best method for identification of patients experiencing the composite endpoint, predicting the composite outcome with an area under receiver operating characteristics (ROC) curve of 0.96 (95% confidence intervals [CI], 0.95-0.98), using either 70 or 23 variables captured before discontinuation. Similar numbers were obtained for sensitivity, specificity, positive predictive value, negative predictive value and accuracy. The discrimination of logistic regression was inferior (area under ROC curve, 0.76 [95% Cl 0.70-0.81]). Calibration plot showed similar deviations from the perfect line for ML-NN and logistic regression.
Conclusions: ML-NN method very well predicted the composite outcome after premature discontinuation of anticoagulation and outperformed traditional logistic regression.
Summary
Predictive tools for major bleeding (MB) using machine learning (ML) might be advantageous over traditional methods. We used data from the Registro Informatizado de Enfermedad TromboEmbólica (RIETE) to develop ML algorithms to identify patients with venous thromboembolism (VTE) at increased risk of MB during the first 3 months of anticoagulation. A total of 55 baseline variables were used as predictors. New data prospectively collected from the RIETE were used for further validation. The RIETE and VTE‐BLEED scores were used for comparisons. External validation was performed with the COMMAND‐VTE database. Learning was carried out with data from 49 587 patients, of whom 873 (1.8%) had MB. The best performing ML method was XGBoost. In the prospective validation cohort the sensitivity, specificity, positive predictive value and F1 score were: 33.2%, 93%, 10%, and 15.4% respectively. F1 value for the RIETE and VTE‐BLEED scores were 8.6% and 6.4% respectively. In the external validation cohort the metrics were 10.3%, 87.6%, 3.5% and 5.2% respectively. In that cohort, the F1 value for the RIETE score was 17.3% and for the VTE‐BLEED score 9.75%. The performance of the XGBoost algorithm was better than that from the RIETE and VTE‐BLEED scores only in the prospective validation cohort, but not in the external validation cohort.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.