2021
DOI: 10.1016/j.media.2021.102225
|View full text |Cite
|
Sign up to set email alerts
|

Public Covid-19 X-ray datasets and their impact on model bias – A systematic review of a significant problem

Abstract: Graphical abstract

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
26
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 50 publications
(31 citation statements)
references
References 82 publications
0
26
0
Order By: Relevance
“…In Sheykhivand et al (2021) , generative adversarial network (GAN) is employed to generate CXR images of COVID-19 category, which has enlarged the sample capacity to nearly 7 times and facilitated more robust feature learning. Notice that the relatively less COVID-19 training samples can lead to model bias, and aiming at this phenomenon, work Garcia Santa Cruz et al (2021) has presented a systematic inspection on public COVID-19 X-ray imaging datasets and provided effective guidance accordingly.…”
Section: Introductionmentioning
confidence: 99%
“…In Sheykhivand et al (2021) , generative adversarial network (GAN) is employed to generate CXR images of COVID-19 category, which has enlarged the sample capacity to nearly 7 times and facilitated more robust feature learning. Notice that the relatively less COVID-19 training samples can lead to model bias, and aiming at this phenomenon, work Garcia Santa Cruz et al (2021) has presented a systematic inspection on public COVID-19 X-ray imaging datasets and provided effective guidance accordingly.…”
Section: Introductionmentioning
confidence: 99%
“…Many studies use data from sources with minimal provenance and metadata, and often use data that was not intended for training diagnostic or prognostic tools. A number of datasets aggregate data from different sources, some of which may be aggregates themselves [ 9 ]; and many studies aggregate a number of datasets, either to increase their training size or to provide an independent test set. However, this causes a complex set or participants and leads to a high risk that the same images are present in the training and evaluation set.…”
Section: Discussionmentioning
confidence: 99%
“…Due to reports of a high risk-of-bias in the field [ 9 , 13 , 15 ], we include a bias assessment. Improper study design, data collection, data partitioning and statistical methods can lead to misleading reported results [ 14 ].…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations