This study is a technical supplement to "AI gone astray: How subtle shifts in patient data send popular algorithms reeling, undermining patient safety." from STAT News 1 , which investigates the effect of time drift on clinically deployed machine learning models. We use MIMIC-IV, a publicly available dataset, to train models that replicate commercial approaches by Dascena and Epic to predict the onset of sepsis, a deadly and yet treatable condition. We observe some of these models degrade over time; most notably an RNN built on Epic features degrades from a 0.729 AUC to a 0.525 AUC over a decade, leading us to investigate technical and clinical drift as root causes of this performance drop.
MethodsDataset We investigate time drift using the MIMIC-IV database [1], which includes electronic health records of over 50,000 patients admitted to the intensive care units at the Beth Israel Deaconess Medical Center (BIDMC) between the years 2008-2019. We filter for patients over the age of 15, with an ICU stay between 24 hours and 10 days, and take each patient's first ICU stay (see Figure 1). After all filtering, we end up with 50k patients in our dataset.