Deep learning predicts hip fracture using confounding patient and healthcare variables

Badgeley, Marcus A.; Zech, John R.; Oakden-Rayner, Luke; Glicksberg, Benjamin S.; Liu, Manway; Gale, William A.; McConnell, Michael V.; Percha, Bethany; Snyder, Thomas M.; Dudley, Joel T.

doi:10.1038/s41746-019-0105-1

Cited by 190 publications

(135 citation statements)

References 41 publications

(71 reference statements)

Supporting

Mentioning

121

Contrasting

Unclassified

Order By: Relevance

“…Because under our proposed resampling approach the validation and the training data share the same distribution but the conditional feature distribution p(x|y) of the test data is shifted, the robustness of a CNN to distribution shift can be quantified by comparing the test accuracy curves to the validation accuracy curves in our experiments (see Figures 2,3,and 4). There are different approaches to carry out such a comparison.…”

Section: Comparison Of Cnn Models With Respect To Their Robustnessmentioning

confidence: 99%

“…Recent studies have shown that deep learning methods may not generalize well beyond the training data distribution. For instance, deep learning models are vulnerable to adversarial perturbations [1], are prone to biases and unfairness [2], or may significantly but unknowingly depend on confounding variables resulting from the training data collection process [3]. In this work we focus on distribution shift, which is another important phenomenon that can have a significant negative impact on the performance of deep learning models [4].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Variational Resampling Based Assessment of Deep Neural Networks under Distribution Shift

Sun¹,

Gossmann

Wang

et al. 2019

2019 IEEE Symposium Series on Computational Intelligence (SSCI)

View full text Add to dashboard Cite

A novel resampling framework is proposed to evaluate the robustness and generalization capability of deep learning models with respect to distribution shift. We use Auto Encoder Variational Bayes to find a latent representation of the data, on which a Variational Gaussian Mixture Model is applied to deliberately create distribution shift by dividing the dataset into different clusters. Wasserstein distance is used to characterize the extent of distribution shift between the training and the testing data splits. We compare several conventional Convolutional Neural Network (CNN) architectures as well as Bayesian CNN models for image classification on the Fashion-MNIST dataset to assess their robustness under the deliberately created distribution shift.

show abstract

Section: Comparison Of Cnn Models With Respect To Their Robustnessmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Variational Resampling Based Assessment of Deep Neural Networks under Distribution Shift

Sun¹,

Gossmann

Wang

et al. 2019

2019 IEEE Symposium Series on Computational Intelligence (SSCI)

View full text Add to dashboard Cite

show abstract

“…The increasingly large clinical burden of manual EEG analysis has motivated the recent development of automated algorithms for detecting seizures on EEG. Modern deep machine learning methods represent a particularly promising set of approaches for this task, as they have recently seen widespread success in medical domains including skin lesion classification from dermatoscopy 12 , automated interpretation of chest radiographs 13 , in-hospital mortality prediction from electronic health records 14 , and many others [15][16][17][18][19][20][21] . However, existing deep learning methods rely on the curation and continual maintenance of massive hand-labeled datasets, which has recently been identified as the major bottleneck in supervised medical machine learning 15 .…”

Section: Introductionmentioning

confidence: 99%

Weak supervision as an efficient approach for automated seizure detection in electroencephalography

et al. 2020

View full text Add to dashboard Cite

Automated seizure detection from electroencephalography (EEG) would improve the quality of patient care while reducing medical costs, but achieving reliably high performance across patients has proven difficult. Convolutional Neural Networks (CNNs) show promise in addressing this problem, but they are limited by a lack of large labeled training datasets. We propose using imperfect but plentiful archived annotations to train CNNs for automated, real-time EEG seizure detection across patients. While these weak annotations indicate possible seizures with precision scores as low as 0.37, they are commonly produced in large volumes within existing clinical workflows by a mixed group of technicians, fellows, students, and board-certified epileptologists. We find that CNNs trained using such weak annotations achieve Area Under the Receiver Operating Characteristic curve (AUROC) values of 0.93 and 0.94 for pediatric and adult seizure onset detection, respectively. Compared to currently deployed clinical software, our model provides a 31% increase (18 points) in F1-score for pediatric patients and a 17% increase (11 points) for adult patients. These results demonstrate that weak annotations, which are sustainably collected via existing clinical workflows, can be leveraged to produce clinically useful seizure detection models.

show abstract

“…Furthermore, even within a single hospital system, confounded predictions may be a problem for deep learning. For example, Badgeley et al [1] demonstrated that a deep learning hip fracture classifier was leveraging patient-level variables (such as age and gender) and process-level variables (such as scanner model and hospital department) in its predictions. After controlling for these variables during model evaluation by rebalancing the test set, they found that the classifier performed no better than random.…”

Section: Introductionmentioning

confidence: 99%

An adversarial approach for the robust classification of pneumonia from chest radiographs

Janizek

Erion

DeGrave

et al. 2020

Proceedings of the ACM Conference on Health, Inference, and Learning

View full text Add to dashboard Cite

While deep learning has shown promise in the domain of disease classification from medical images, models based on state-of-the-art convolutional neural network architectures often exhibit performance loss due to dataset shift. Models trained using data from one hospital system achieve high predictive performance when tested on data from the same hospital, but perform significantly worse when they are tested in different hospital systems. Furthermore, even within a given hospital system, deep learning models have been shown to depend on hospital-and patient-level confounders rather than meaningful pathology to make classifications. In order for these models to be safely deployed, we would like to ensure that they do not use confounding variables to make their classification, and that they will work well even when tested on images from hospitals that were not included in the training data. We attempt to address this problem in the context of pneumonia classification from chest radiographs. We propose an approach based on adversarial optimization, which allows us to learn more robust models that do not depend on confounders. Specifically, we demonstrate improved out-of-hospital generalization performance of a pneumonia classifier by training a model that is invariant to the view position of chest radiographs (anterior-posterior vs. posterior-anterior). Our approach leads to better predictive performance on external hospital data than both a standard baseline and previously proposed methods to handle confounding, and also suggests a method for identifying models that may rely on confounders. PROBLEM STATEMENTWe first consider some of the causal relationships forming part of one plausible data generating process for chest radiographs, given by the random variable X in Figure 1. A patient's pneumonia status, given by the random variable Y , will lead to a variety of anatomically-relevant features A, such as increased radiopacity or Code to reproduce this project is available at https://github.com/suinleelab/cxr_adv arXiv:2001.04051v1 [cs.LG] 13 Jan 2020 , ,Janizek et al.

show abstract

Deep learning predicts hip fracture using confounding patient and healthcare variables

Cited by 190 publications

References 41 publications

Variational Resampling Based Assessment of Deep Neural Networks under Distribution Shift

Variational Resampling Based Assessment of Deep Neural Networks under Distribution Shift

Weak supervision as an efficient approach for automated seizure detection in electroencephalography

An adversarial approach for the robust classification of pneumonia from chest radiographs

Contact Info

Product

Resources

About