2021
DOI: 10.1109/access.2021.3079716
|View full text |Cite
|
Sign up to set email alerts
|

Discovery of a Generalization Gap of Convolutional Neural Networks on COVID-19 X-Rays Classification

Abstract: A number of recent papers have shown experimental evidence that suggests it is possible to build highly accurate deep neural network models to detect COVID-19 from chest X-ray images. In this paper, we show that good generalization to unseen sources has not been achieved. Experiments with richer data sets than have previously been used show models have high accuracy on seen sources, but poor accuracy on unseen sources. The reason for the disparity is that the convolutional neural network model, which learns fe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
25
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 33 publications
(25 citation statements)
references
References 43 publications
0
25
0
Order By: Relevance
“…This is a specific shortcoming of deep learning algorithms as they do not preclude the algorithm from learning features present in the training set that are arbitrarily correlated with the disease, yet are completely irrelevant. These can stem from characteristics of the imaging devices or clinical practices such as patient positioning [ 23 , 24 , 25 ] used at the specific locations. If a model implicitly learns such features it will not generalize well when presented with data obtained on different imaging devices or using different clinical workflows, both of which are irrelevant to disease diagnosis.…”
Section: Previous Workmentioning
confidence: 99%
“…This is a specific shortcoming of deep learning algorithms as they do not preclude the algorithm from learning features present in the training set that are arbitrarily correlated with the disease, yet are completely irrelevant. These can stem from characteristics of the imaging devices or clinical practices such as patient positioning [ 23 , 24 , 25 ] used at the specific locations. If a model implicitly learns such features it will not generalize well when presented with data obtained on different imaging devices or using different clinical workflows, both of which are irrelevant to disease diagnosis.…”
Section: Previous Workmentioning
confidence: 99%
“…This has been frequently reported in the recent literature, for instances, on the classification of abnormal chest radiographs, 47 on the diagnosis of pneumonia, 48 and on the detection of COVID-19 from chest radiographs. 49 In addition to the possible reasons discussed in the previous paragraph, a particular issue found by these authors is that CNNs have readily learned features specific to sites (such as metal tokens placed on patients at a specific site) rather than the pathology information in the image. As such, proper testing procedures that mimic real-world implementation are crucial before any AI clinical deployment.…”
Section: Potential Use Cases and Issuesmentioning
confidence: 99%
“…Also we attempted to test the proposed architecture on two dataset with different sizes. To demonstrate the obtained results, we compared using the same metrics with state-of-the-art methods including Islam et al [ 5 ], Chowdhury et al [ 27 ], Rahimzade et al [ 41 ], Ucar et al [ 42 ], An et al [ 43 ], Ozturk et al [ 44 ], Punn et al [ 45 ], Narin et al [ 46 ], Ozcan et al [ 47 ], Bukhari et al [ 48 ], Mukherjee et al [ 49 ], Shankar et al [ 53 ], Yamaç et al [ 54 ], ZHOU et al [ 55 ], Tang et al [ 56 ], Narin et al [ 57 ], Ahsan et al [ 58 ], and Kaoutar Ben et al [ 59 ]. This section provides a description of the used datasets, experimental setup of the proposed deep-learning-based model, and also a discussion of the obtained results.…”
Section: Resultsmentioning
confidence: 99%