2019
DOI: 10.1038/s41746-019-0105-1
|View full text |Cite
|
Sign up to set email alerts
|

Deep learning predicts hip fracture using confounding patient and healthcare variables

Abstract: Hip fractures are a leading cause of death and disability among older adults. Hip fractures are also the most commonly missed diagnosis on pelvic radiographs, and delayed diagnosis leads to higher cost and worse outcomes. Computer-aided diagnosis (CAD) algorithms have shown promise for helping radiologists detect fractures, but the image features underpinning their predictions are notoriously difficult to understand. In this study, we trained deep-learning models on 17,587 radiographs to classify fracture, 5 p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

3
121
0
1

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 190 publications
(135 citation statements)
references
References 41 publications
(71 reference statements)
3
121
0
1
Order By: Relevance
“…Because under our proposed resampling approach the validation and the training data share the same distribution but the conditional feature distribution p(x|y) of the test data is shifted, the robustness of a CNN to distribution shift can be quantified by comparing the test accuracy curves to the validation accuracy curves in our experiments (see Figures 2,3,and 4). There are different approaches to carry out such a comparison.…”
Section: Comparison Of Cnn Models With Respect To Their Robustnessmentioning
confidence: 99%
See 1 more Smart Citation
“…Because under our proposed resampling approach the validation and the training data share the same distribution but the conditional feature distribution p(x|y) of the test data is shifted, the robustness of a CNN to distribution shift can be quantified by comparing the test accuracy curves to the validation accuracy curves in our experiments (see Figures 2,3,and 4). There are different approaches to carry out such a comparison.…”
Section: Comparison Of Cnn Models With Respect To Their Robustnessmentioning
confidence: 99%
“…Recent studies have shown that deep learning methods may not generalize well beyond the training data distribution. For instance, deep learning models are vulnerable to adversarial perturbations [1], are prone to biases and unfairness [2], or may significantly but unknowingly depend on confounding variables resulting from the training data collection process [3]. In this work we focus on distribution shift, which is another important phenomenon that can have a significant negative impact on the performance of deep learning models [4].…”
Section: Introductionmentioning
confidence: 99%
“…The increasingly large clinical burden of manual EEG analysis has motivated the recent development of automated algorithms for detecting seizures on EEG. Modern deep machine learning methods represent a particularly promising set of approaches for this task, as they have recently seen widespread success in medical domains including skin lesion classification from dermatoscopy 12 , automated interpretation of chest radiographs 13 , in-hospital mortality prediction from electronic health records 14 , and many others [15][16][17][18][19][20][21] . However, existing deep learning methods rely on the curation and continual maintenance of massive hand-labeled datasets, which has recently been identified as the major bottleneck in supervised medical machine learning 15 .…”
Section: Introductionmentioning
confidence: 99%
“…Furthermore, even within a single hospital system, confounded predictions may be a problem for deep learning. For example, Badgeley et al [1] demonstrated that a deep learning hip fracture classifier was leveraging patient-level variables (such as age and gender) and process-level variables (such as scanner model and hospital department) in its predictions. After controlling for these variables during model evaluation by rebalancing the test set, they found that the classifier performed no better than random.…”
Section: Introductionmentioning
confidence: 99%