The high capacity of neural networks allows fitting models to data with high precision, but makes generalization to unseen data a challenge. If a domain shift exists, i.e. differences in image statistics between training and test data, care needs to be taken to ensure reliable deployment in real-world scenarios. In digital pathology, domain shift can be manifested in differences between whole-slide images, introduced by for example differences in acquisition pipeline -between medical centers or over time. In order to harness the great potential presented by deep learning in histopathology, and ensure consistent model behavior, we need a deeper understanding of domain shift and its consequences, such that a model's predictions on new data can be trusted.This work focuses on the internal representation learned by trained convolutional neural networks, and shows how this can be used to formulate a novel measure -the representation shift -for quantifying the magnitude of model-specific domain shift. We perform a study on domain shift in tumor classification of hematoxylin and eosin stained images, by considering different datasets, models, and techniques for preparing data in order to reduce the domain shift. The results show how the proposed measure has a high correlation with drop in performance when testing a model across a large number of different types of domain shifts, and how it improves on existing techniques for measuring data shift and uncertainty. The proposed measure can reveal how sensitive a model is to domain variations, and can be used to detect new data that a model will have problems generalizing to. We see techniques for measuring, understanding and overcoming the domain shift as a crucial step towards reliable use of deep learning in the future clinical pathology applications.
Unsupervised learning has made substantial progress over the last few years, especially by means of contrastive self-supervised learning. The dominating dataset for benchmarking self-supervised learning has been ImageNet, for which recent methods are approaching the performance achieved by fully supervised training. The ImageNet dataset is however largely object-centric, and it is not clear yet what potential those methods have on widely different datasets and tasks that are not object-centric, such as in digital pathology. While self-supervised learning has started to be explored within this area with encouraging results, there is reason to look closer at how this setting differs from natural images and ImageNet. In this paper we make an in-depth analysis of contrastive learning for histopathology, pinpointing how the contrastive objective will behave differently due to the characteristics of histopathology data. We bring forward a number of considerations, such as view generation for the contrastive objective and hyper-parameter tuning. In a large battery of experiments, we analyze how the downstream performance in tissue classification will be affected by these considerations. The results point to how contrastive learning can reduce the annotation effort within digital pathology, but that the specific dataset characteristics need to be considered. To take full advantage of the contrastive learning objective, different calibrations of view generation and hyper-parameters are required. Our results pave the way for realizing the full potential of self-supervised learning for histopathology applications. Code and trained models are available at https://github.com/k-stacke/ssl-pathology.
Artificial intelligence (AI) holds much promise for enabling highly desired imaging diagnostics improvements. One of the most limiting bottlenecks for the development of useful clinical-grade AI models is the lack of training data. One aspect is the large amount of cases needed and another is the necessity of high-quality ground truth annotation. The aim of the project was to establish and describe the construction of a database with substantial amounts of detail-annotated oncology imaging data from pathology and radiology. A specific objective was to be proactive, that is, to support undefined subsequent AI training across a wide range of tasks, such as detection, quantification, segmentation, and classification, which puts particular focus on the quality and generality of the annotations. The main outcome of this project was the database as such, with a collection of labeled image data from breast, ovary, skin, colon, skeleton, and liver. In addition, this effort also served as an exploration of best practices for further scalability of high-quality image collections, and a main contribution of the study was generic lessons learned regarding how to successfully organize efforts to construct medical imaging databases for AI training, summarized as eight guiding principles covering team, process, and execution aspects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.