Can You Trust Predictive Uncertainty Under Real Dataset Shifts in Digital Pathology?

Thagaard, Jeppe; Hauberg, Søren; Vegt, Bert van der; Ebstrup, Thomas; Hansen, Johan Damgaard; Dahl, Anders Bjorholm

doi:10.1007/978-3-030-59710-8_80

Cited by 26 publications

(27 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Clinical models trained on one hospital or region typically degrade in performance in the presence of domain shift [5,21,50,63,73,74,77,84]. In this paper, we evaluated the performance of eight domain generalization methods on their ability to generalize to an unseen test environment for typical clinical datasets.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

An empirical framework for domain generalization in clinical settings

Zhang

Dullerud

Seyyed-Kalantari

et al. 2021

Proceedings of the Conference on Health, Inference, and Learning

View full text Add to dashboard Cite

Clinical machine learning models experience significantly degraded performance in datasets not seen during training, e.g., new hospitals or populations. Recent developments in domain generalization offer a promising solution to this problem by creating models that learn invariances across environments. In this work, we benchmark the performance of eight domain generalization methods on multi-site clinical time series and medical imaging data. We introduce a framework to induce synthetic but realistic domain shifts and sampling bias to stress-test these methods over existing nonhealthcare benchmarks. We find that current domain generalization methods do not achieve significant gains in out-of-distribution performance over empirical risk minimization on real-world medical imaging data, in line with prior work on general imaging datasets. However, a subset of realistic induced-shift scenarios in clinical time series data exhibit limited performance gains. We characterize these scenarios in detail, and recommend best practices for domain generalization in the clinical setting. CCS CONCEPTS• Computing methodologies → Machine learning; • Applied computing → Health informatics; • General and reference → Empirical studies.

show abstract

Section: Discussionmentioning

confidence: 99%

“…Prior work has found significant decreases in model performance under the presence of cross-institutional domain shift, in the chest X-ray [21,63,84], MRI [5,50], and pathology [73,74,77] settings. Temporal domain shifts have also been found to reduce performance in clinical machine learning models [53].…”

Section: Introductionmentioning

confidence: 99%

An empirical framework for domain generalization in clinical settings

Zhang

Dullerud

Seyyed-Kalantari

et al. 2021

Proceedings of the Conference on Health, Inference, and Learning

View full text Add to dashboard Cite

show abstract

“…First, even though our models show good generalizability on the retrospective cohort ( n = 480 WSIs), we developed them on a limited number of cases. This means that the models might not perform optimally on another study cohort from a different site with a distributional shift in, e.g., preanalytical protocols, staining protocol, or scanner type [ 56 , 57 ]. Future development of our approach should extend the development dataset of both tissue- and cell-level models to be multi-institutional, covering the innate variability of the above-mentioned factors.…”

Section: Discussionmentioning

confidence: 99%

Automated Quantification of sTIL Density with H&E-Based Digital Image Analysis Has Prognostic Potential in Triple-Negative Breast Cancers

Thagaard

Stovgaard

Vognsen

et al. 2021

Cancers

Self Cite

View full text Add to dashboard Cite

Triple-negative breast cancer (TNBC) is an aggressive and difficult-to-treat cancer type that represents approximately 15% of all breast cancers. Recently, stromal tumor-infiltrating lymphocytes (sTIL) resurfaced as a strong prognostic biomarker for overall survival (OS) for TNBC patients. Manual assessment has innate limitations that hinder clinical adoption, and the International Immuno-Oncology Biomarker Working Group (TIL-WG) has therefore envisioned that computational assessment of sTIL could overcome these limitations and recommended that any algorithm should follow the manual guidelines where appropriate. However, no existing studies capture all the concepts of the guideline or have shown the same prognostic evidence as manual assessment. In this study, we present a fully automated digital image analysis pipeline and demonstrate that our hematoxylin and eosin (H&E)-based pipeline can provide a quantitative and interpretable score that correlates with the manual pathologist-derived sTIL status, and importantly, can stratify a retrospective cohort into two significant distinct prognostic groups. We found our score to be prognostic for OS (HR: 0.81 CI: 0.72–0.92 p = 0.001) independent of age, tumor size, nodal status, and tumor type in statistical modeling. While prior studies have followed fragments of the TIL-WG guideline, our approach is the first to follow all complex aspects, where appropriate, supporting the TIL-WG vision of computational assessment of sTIL in the future clinical setting.

show abstract

“…In four different unseen domains, BigAug obtains a comparable performance to the two state-of-the-art methods. Finally, in digital pathology and histopahology, the domain shift effect for deep learning has been studied in Thagaard et al (2020); Stacke et al (2019Stacke et al ( , 2020.…”

Section: Samalamentioning

confidence: 99%

Domain generalization in deep learning-based mass detection in mammography: A large-scale multi-center study

Garrucho¹,

Kushibar²,

Jouide³

et al. 2022

Preprint

View full text Add to dashboard Cite

Computer-aided detection systems based on deep learning have shown a great potential in breast cancer detection. However, the lack of domain generalization of artificial neural networks is an important obstacle to their deployment in changing clinical environments. In this work, we explore the domain generalization of deep learning methods for mass detection in digital mammography and analyze in-depth the sources of domain shift in a large-scale multi-center setting. To this end, we compare the performance of eight state-of-the-art detection methods, including Transformer-based models, trained in a single-domain and tested in five unseen domains. Moreover, a single-source mass detection training pipeline is designed to improve the domain generalization without requiring images from the new domain. The results show that our workflow generalizes better than state-of-the-art transfer learning-based approaches in four out of five domains, while reducing the domain shift caused by the different acquisition protocols and scanner manufacturers. Subsequently, an extensive analysis is performed to identify the covariate shifts with bigger effects on the detection performance, such as due to differences in patient age, breast density, mass size and mass malignancy. Ultimately, this comprehensive study provides key insights and best practices for future research on domain generalization in deep learning-based breast cancer detection.

show abstract

Can You Trust Predictive Uncertainty Under Real Dataset Shifts in Digital Pathology?

Cited by 26 publications

References 9 publications

An empirical framework for domain generalization in clinical settings

An empirical framework for domain generalization in clinical settings

Automated Quantification of sTIL Density with H&E-Based Digital Image Analysis Has Prognostic Potential in Triple-Negative Breast Cancers

Domain generalization in deep learning-based mass detection in mammography: A large-scale multi-center study

Contact Info

Product

Resources

About