At the COVID-19 pandemic onset, when individual-level data of COVID-19 patients were not yet available, there was already a need for risk predictors to support prevention and treatment decisions. Here, we report a hybrid strategy to create such a predictor, combining the development of a baseline severe respiratory infection risk predictor and a post-processing method to calibrate the predictions to reported COVID-19 case-fatality rates. With the accumulation of a COVID-19 patient cohort, this predictor is validated to have good discrimination (area under the receiver-operating characteristics curve of 0.943) and calibration (markedly improved compared to that of the baseline predictor). At a 5% risk threshold, 15% of patients are marked as high-risk, achieving a sensitivity of 88%. We thus demonstrate that even at the onset of a pandemic, shrouded in epidemiologic fog of war, it is possible to provide a useful risk predictor, now widely used in a large healthcare organization.
With the global coronavirus disease 2019 (COVID-19) pandemic, there is an urgent need for risk stratification tools to support prevention and treatment decisions. The Centers for Disease Control and Prevention (CDC) listed several criteria that define high-risk individuals, but multivariable prediction models may allow for a more accurate and granular risk evaluation. In the early days of the pandemic, when individual level data required for training prediction models was not available, a large healthcare organization developed a prediction model for supporting its COVID-19 policy using a hybrid strategy. The model was constructed on a baseline predictor to rank patients according to their risk for severe respiratory infection or sepsis (trained using over one-million patient records) and was then post-processed to calibrate the predictions to reported COVID-19 case fatality rates. Since its deployment in mid-March, this predictor was integrated into many decision-processes in the organization that involved allocating limited resources. With the accumulation of enough COVID-19 patients, the predictor was validated for its accuracy in predicting COVID-19 mortality among all COVID-19 cases in the organization (3,176, 3.1% death rate). The predictor was found to have good discrimination, with an area under the receiver-operating characteristics curve of 0.942. Calibration was also good, with a marked improvement compared to the calibration of the baseline model when evaluated for the COVID-19 mortality outcome. While the CDC criteria identify 41% of the population as high-risk with a resulting sensitivity of 97%, a 5% absolute risk cutoff by the model tags only 14% to be at high-risk while still achieving a sensitivity of 90%. To summarize, we found that even in the midst of a pandemic, shrouded in epidemiologic "fog of war" and with no individual level data, it was possible to provide a useful predictor with good discrimination and calibration.
Out-of-domain (OOD) generalization is a significant challenge for machine learning models. To overcome it, many novel techniques have been proposed, often focused on learning models with certain invariance properties. In this work, we draw a link between OOD performance and model calibration, arguing that calibration across multiple domains can be viewed as a special case of an invariant representation leading to better OOD generalization. Specifically, we prove in a simplified setting that models which achieve multi-domain calibration are free of spurious correlations. This leads us to propose multi-domain calibration as a measurable surrogate for the OOD performance of a classifier. An important practical benefit of calibration is that there are many effective tools for calibrating classifiers. We show that these tools are easy to apply and adapt for a multi-domain setting. Using five datasets from the recently proposed WILDS OOD benchmark [23] we demonstrate that simply re-calibrating models across multiple domains in a validation set leads to significantly improved performance on unseen test domains. We believe this intriguing connection between calibration and OOD generalization is promising from a practical point of view and deserves further research from a theoretical point of view. * Equal contribution Preprint. Under review.
Constructing fast numerical solvers for partial differential equations (PDEs) is crucial for many scientific disciplines. A leading technique for solving large-scale PDEs is using multigrid methods. At the core of a multigrid solver is the prolongation matrix, which relates between different scales of the problem. This matrix is strongly problem-dependent, and its optimal construction is critical to the efficiency of the solver. In practice, however, devising multigrid algorithms for new problems often poses formidable challenges. In this paper we propose a framework for learning multigrid solvers. Our method learns a (single) mapping from a family of parameterized PDEs to prolongation operators. We train a neural network once for the entire class of PDEs, using an efficient and unsupervised loss function. Experiments on a broad class of 2D diffusion problems demonstrate improved convergence rates compared to the widely used Black-Box multigrid scheme, suggesting that our method successfully learned rules for constructing prolongation matrices.
Saliency methods are a popular approach for model debugging and explainability. However, in the absence of ground-truth data for what the correct maps should be, evaluating and comparing different approaches remains a long-standing challenge. The sanity checks methodology of Adebayo et al [Neurips 2018] has sought to address this challenge. They argue that some popular saliency methods should not be used for explainability purposes since the maps they produce are not sensitive to the underlying model that is to be explained. Through a causal re-framing of their objective, we argue that their empirical evaluation does not fully establish these conclusions, due to a form of confounding introduced by the tasks they evaluate on. Through various experiments on simple custom tasks we demonstrate that some of their conclusions may indeed be artifacts of the tasks more than a criticism of the saliency methods themselves. More broadly, our work challenges the utility of the sanity check methodology, and further highlights that saliency map evaluation beyond ad-hoc visual examination remains a fundamental challenge.1st Workshop on eXplainable AI approaches for debugging and diagnosis (XAI4Debugging@NeurIPS2021).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.