The Impact of Digital Histopathology Batch Effect on Deep Learning Model Accuracy and Bias

Howard, Frederick M.; Dolezal, James M.; Kochanny, Sara; Schulte, Jefree J.; Chen, Heather; Heij, Lara; Huo, Dezheng; Nanda, Rita; Olopade, Olufunmilayo I.; Kather, Jakob Nikolas; Grossman, Robert L.; Pearson, Alexander T.

doi:10.1101/2020.12.03.410845

Cited by 17 publications

(12 citation statements)

References 63 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Additionally, inter-scanner variability may further exacerbate such biases. Recent research also suggests that WSIs preserve site-specific information which can be learned by a deep learning algorithm, resulting in overestimation of model performance [23]. Stain colour normalisation aims to mitigate such batch effects by transforming pixel values from different WSIs within a data set to a common distribution.…”

Section: Colour Normalisation and Augmentationmentioning

confidence: 99%

Deep Learning of Histopathological Features for the Prediction of Tumour Molecular Genetics

et al. 2021

View full text Add to dashboard Cite

Advanced diagnostics are enabling cancer treatments to become increasingly tailored to the individual through developments in immunotherapies and targeted therapies. However, long turnaround times and high costs of molecular testing hinder the widespread implementation of targeted cancer treatments. Meanwhile, gold-standard histopathological assessment carried out by a trained pathologist is widely regarded as routine and mandatory in most cancers. Recently, methods have been developed to mine hidden information from histopathological slides using deep learning applied to scanned and digitized slides; deep learning comprises a collection of computational methods which learn patterns in data in order to make predictions. Such methods have been reported to be successful in a variety of cancers for predicting the presence of biomarkers such as driver mutations, tumour mutational burden, and microsatellite instability. This information could prove valuable to pathologists and oncologists in clinical decision making for cancer treatment and triage for in-depth sequencing. In addition to identifying molecular features, deep learning has been applied to predict prognosis and treatment response in certain cancers. Despite reported successes, many challenges remain before the clinical implementation of such diagnostic strategies in the clinical setting is possible. This review aims to outline recent developments in the field of deep learning for predicting molecular genetics from histopathological slides, as well as to highlight limitations and pitfalls of working with histopathology slides in deep learning.

show abstract

Section: Colour Normalisation and Augmentationmentioning

confidence: 99%

Deep Learning of Histopathological Features for the Prediction of Tumour Molecular Genetics

et al. 2021

View full text Add to dashboard Cite

show abstract

“…For CV applications, lack of model generalizability is often a result of the effect of spurious correlates introduced as a result of the WSI preparation process, also known as batch effects [12,13,14]. Mitigating all forms of batch effects parametrically incurs challenges since batch effects may arise from different parts of the tissue preprocessing pipeline such as the scanner acquisition protocol, slide preparation date and thickness of tissue sections [15,16,17,18]. These batch effects remain detectable by machine learning algorithms and can induce spurious correlates.…”

Section: Spurious Confounders In Digital Pathologymentioning

confidence: 99%

“…Mitigating all forms of batch effects parametrically incurs challenges since batch effects may arise from different parts of the tissue pre-processing pipeline [9]. Being able to predict artifacts of the scan, such as scanner manufacturer and acquisition protocol [10], slide preparation date, source site from which the scan was taken, [10][11] and image quality [12] can induce spurious correlates. Models are likely to learn these spurious correlates when trained to near-zero training error in the weak-label regime [13].…”

Section: Introductionmentioning

confidence: 99%

Examining Batch Effect in Histopathology as a Distributionally Robust Optimization Problem

Hari

Nyman

Mehta

et al. 2021

Preprint

View full text Add to dashboard Cite

Computer vision (CV) approaches applied to digital pathology have informed biological discovery and development of tools to help inform clinical decision-making. However, batch effects in the images represent a major challenge to effective analysis and interpretation of these data. The standard methods to circumvent learning such confounders include (i) application of image augmentation techniques and (ii) examination of the learning process by evaluating through external validation (e.g., unseen data coming from a comparable dataset collected at another hospital). Here, we show that the source site of a histopathology slide can be learned from the image using CV algorithms in spite of image augmentation, and we explore these source site predictions using interpretability tools. A CV model trained using Empirical Risk Minimization (ERM) risks learning this signal as a spurious correlate in the weak-label regime, which we abate by using a Distributionally Robust Optimization (DRO) method with abstention. We find that the model trained using DRO outperforms a model trained using ERM by 9.9, 13 and 15% in identifying tumor versus normal tissue in Lung Adenocarcinoma, Gleason score in Prostate Adenocarcinoma, and tumor tissue grade in clear cell renal cell carcinoma. Further, by examining the areas abstained by the model, we find that the model trained using a DRO method is more robust to heterogeneity and artifacts in the tissue. We believe that a DRO method trained with abstention may offer novel insights into relevant areas of the tissue contributing to a particular phenotype. Together, we suggest using data augmentation methods that help mitigate a digital pathology model's reliance on spurious visual features, as well as selecting models that are more robust to spurious features for translational discovery and clinical decision support.

show abstract

“…Nonetheless, compiling such international large scale datasets is costly and unfeasible in most cases and does not eliminate batch effects. Regardless of the dataset size and number of contributing sites, the propensity for overfitting of digital histology models to site level characteristics is incompletely characterized and is infrequently accounted for in internal validation of deep learning models [ 19 ]. For example assessments of stain normalization and augmentation techniques have focused on the performance of models in validation sets, rather than true elimination of batch effect [ 15 , 20 ].…”

Section: Introductionmentioning

confidence: 99%

“…Batch effects in training, validation and testing, must be accounted for to ensure equitable application of DL. Batch effects leads to overoptimistic estimates of model performance and methods to not only palliate but to directly abrogate this bias are needed [ 19 ].…”

Section: Introductionmentioning

confidence: 99%

xDEEP-MSI: Explainable Bias-Rejecting Microsatellite Instability Deep Learning System in Colorectal Cancer

Bustos¹,

Payá

Torrubia³

et al. 2021

Biomolecules

View full text Add to dashboard Cite

The prediction of microsatellite instability (MSI) using deep learning (DL) techniques could have significant benefits, including reducing cost and increasing MSI testing of colorectal cancer (CRC) patients. Nonetheless, batch effects or systematic biases are not well characterized in digital histology models and lead to overoptimistic estimates of model performance. Methods to not only palliate but to directly abrogate biases are needed. We present a multiple bias rejecting DL system based on adversarial networks for the prediction of MSI in CRC from tissue microarrays (TMAs), trained and validated in 1788 patients from EPICOLON and HGUA. The system consists of an end-to-end image preprocessing module that tile samples at multiple magnifications and a tissue classification module linked to the bias-rejecting MSI predictor. We detected three biases associated with the learned representations of a baseline model: the project of origin of samples, the patient’s spot and the TMA glass where each spot was placed. The system was trained to directly avoid learning the batch effects of those variables. The learned features from the bias-ablated model achieved maximum discriminative power with respect to the task and minimal statistical mean dependence with the biases. The impact of different magnifications, types of tissues and the model performance at tile vs patient level is analyzed. The AUC at tile level, and including all three selected tissues (tumor epithelium, mucin and lymphocytic regions) and 4 magnifications, was 0.87 ± 0.03 and increased to 0.9 ± 0.03 at patient level. To the best of our knowledge, this is the first work that incorporates a multiple bias ablation technique at the DL architecture in digital pathology, and the first using TMAs for the MSI prediction task.

show abstract

The Impact of Digital Histopathology Batch Effect on Deep Learning Model Accuracy and Bias

Cited by 17 publications

References 63 publications

Deep Learning of Histopathological Features for the Prediction of Tumour Molecular Genetics

Deep Learning of Histopathological Features for the Prediction of Tumour Molecular Genetics

Examining Batch Effect in Histopathology as a Distributionally Robust Optimization Problem

xDEEP-MSI: Explainable Bias-Rejecting Microsatellite Instability Deep Learning System in Colorectal Cancer

Contact Info

Product

Resources

About