Background
The number of studies applying machine learning (ML) to predict acute kidney injury (AKI) has grown steadily over the last decade. We assess and critically appraise the state of the art in ML models for AKI prediction considering performance, methodological soundness, and applicability.
Methods
We searched PubMed and ArXiv, extracted data, and critically appraised studies based on the TRIPOD, CHARMS, and PROBAST guidelines.
Results
Forty-six studies from 3166 titles were included. Thirty-eight studies developed a model, five developed and externally validated one, and three studies externally validated one. Flexible ML methods were used more often than deep learning, although the latter was common with temporal variables and text as predictors. Predictive performance showed an area under receiver operating curves ranging from 0.49 to 0.99. Our critical appraisal identified a high risk of bias in thirty-nine studies. Some studies lacked internal validation, whereas external validation and interpretability of results were rarely considered. Fifteen studies focused on AKI prediction in the intensive care setting, and the USA-derived MIMIC dataset was commonly used. Reproducibility was limited as data and code were usually unavailable.
Conclusions
Flexible ML methods are popular for the prediction of AKI, although more complex models based on deep learning are emerging. Our critical appraisal identified a high risk of bias in most models: Studies should use calibration measures and external validation more often, improve model interpretability, and share data and code to improve reproducibility.
Background
Building Machine Learning (ML) models in healthcare may suffer from time-consuming and potentially biased pre-selection of predictors by hand that can result in limited or trivial selection of suitable models. We aimed to assess the predictive performance of automating the process of building ML models (AutoML) in-hospital mortality prediction modelling of triage COVID-19 patients at ICU admission versus expert-based predictor pre-selection followed by logistic regression.
Methods
We conducted an observational study of all COVID-19 patients admitted to Dutch ICUs between February and July 2020. We included 2,690 COVID-19 patients from 70 ICUs participating in the Dutch National Intensive Care Evaluation (NICE) registry. The main outcome measure was in-hospital mortality. We asessed model performance (at admission and after 24 hours, respectively) of AutoML compared to the more traditional approach of predictor pre-selection and logistic regression.
Findings
:
Predictive performance of the autoML models with variables available at admission shows fair discrimination (average AUROC = 0·75-0·76 (sdev = 0·03), PPV = 0·70-0·76 (sdev = 0·1) at cut-off = 0·3 (the observed mortality rate), and good calibration. This performance is on par with a logistic regression model with selection of patient variables by three experts (average AUROC = 0·78 (sdev = 0·03) and PPV = 0·79 (sdev = 0·2)). Extending the models with variables that are available at 24 hours after admission resulted in models with higher predictive performance (average AUROC = 0·77-0·79 (sdev = 0·03) and PPV = 0·79-0·80 (sdev = 0·10-0·17)).
Conclusions
AutoML delivers prediction models with fair discriminatory performance, and good calibration and accuracy, which is as good as regression models with expert-based predictor pre-selection. In the context of the restricted availability of data in an ICU quality registry, extending the models with variables that are available at 24 hours after admission showed small (but significantly) performance increase.
Acute kidney injury (AKI) is an abrupt decrease of kidney function which is common in the intensive care. Many AKI prediction models have been proposed, but an analysis of what is the added value of clinical notes and medical terminologies has not yet been conducted. We developed and internally validated a model to predict AKI that includes not only clinical variables, but also clinical notes and medical terminologies. Our results were overall good (AUROC > 0.80). The best model used only clinical variables (AUROC 0.899).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.