2020
DOI: 10.1101/2020.12.03.410845
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Impact of Digital Histopathology Batch Effect on Deep Learning Model Accuracy and Bias

Abstract: The Cancer Genome Atlas (TCGA) is one of the largest biorepositories of digital histology. Deep learning (DL) models have been trained on TCGA to predict numerous features directly from histology, including survival, gene expression patterns, and driver mutations. However, we demonstrate that these features vary substantially across tissue submitting sites in TCGA for over 3,000 patients with six cancer subtypes. Additionally, we show that histologic image differences between submitting sites can easily be ide… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(12 citation statements)
references
References 63 publications
0
10
0
Order By: Relevance
“…Additionally, inter-scanner variability may further exacerbate such biases. Recent research also suggests that WSIs preserve site-specific information which can be learned by a deep learning algorithm, resulting in overestimation of model performance [23]. Stain colour normalisation aims to mitigate such batch effects by transforming pixel values from different WSIs within a data set to a common distribution.…”
Section: Colour Normalisation and Augmentationmentioning
confidence: 99%
“…Additionally, inter-scanner variability may further exacerbate such biases. Recent research also suggests that WSIs preserve site-specific information which can be learned by a deep learning algorithm, resulting in overestimation of model performance [23]. Stain colour normalisation aims to mitigate such batch effects by transforming pixel values from different WSIs within a data set to a common distribution.…”
Section: Colour Normalisation and Augmentationmentioning
confidence: 99%
“…For CV applications, lack of model generalizability is often a result of the effect of spurious correlates introduced as a result of the WSI preparation process, also known as batch effects [12,13,14]. Mitigating all forms of batch effects parametrically incurs challenges since batch effects may arise from different parts of the tissue preprocessing pipeline such as the scanner acquisition protocol, slide preparation date and thickness of tissue sections [15,16,17,18]. These batch effects remain detectable by machine learning algorithms and can induce spurious correlates.…”
Section: Spurious Confounders In Digital Pathologymentioning
confidence: 99%
“…Mitigating all forms of batch effects parametrically incurs challenges since batch effects may arise from different parts of the tissue pre-processing pipeline [9]. Being able to predict artifacts of the scan, such as scanner manufacturer and acquisition protocol [10], slide preparation date, source site from which the scan was taken, [10][11] and image quality [12] can induce spurious correlates. Models are likely to learn these spurious correlates when trained to near-zero training error in the weak-label regime [13].…”
Section: Introductionmentioning
confidence: 99%
“…Nonetheless, compiling such international large scale datasets is costly and unfeasible in most cases and does not eliminate batch effects. Regardless of the dataset size and number of contributing sites, the propensity for overfitting of digital histology models to site level characteristics is incompletely characterized and is infrequently accounted for in internal validation of deep learning models [ 19 ]. For example assessments of stain normalization and augmentation techniques have focused on the performance of models in validation sets, rather than true elimination of batch effect [ 15 , 20 ].…”
Section: Introductionmentioning
confidence: 99%
“…Batch effects in training, validation and testing, must be accounted for to ensure equitable application of DL. Batch effects leads to overoptimistic estimates of model performance and methods to not only palliate but to directly abrogate this bias are needed [ 19 ].…”
Section: Introductionmentioning
confidence: 99%