Recommendations on compiling test datasets for evaluating artificial intelligence solutions in pathology

Homeyer, André; Geißler, Christian; Schwen, Lars Ole; Zakrzewski, Falk; Evans, Theodore; Strohmenger, Klaus; Westphal, Max; Bülow, Roman David; Kargl, Michaela; Karjauv, Aray; Isidre, Munné-Bertran,; Retzlaff, Carl Orge; Romero-López, Adrià; Sołtysiński, Tomasz; Plass, Markus; Carvalho, Rodrigo; Steinbach, Peter; Lan, Yu-Chia; Bouteldja, Nassim; Haber, David; Rojas-Carulla, Mateo; Sadr, Alireza Vafaei; Matthias, Kraft,; Krüger, Daniel; Fick, Rutger; Lang, Tobias; Müller, Heimo; Hufnagl, Peter; Zerbe, Norman

doi:10.1038/s41379-022-01147-y

Cited by 35 publications

(42 citation statements)

References 136 publications

(168 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Like other ensemble methods ( 6 , 31 ), our NoisyEnsembles also have the potential to increase the overall performance of the deep learning application ( Figures 4A,B vs. Supplementary Figure S11 ). For testing, it is recommended to sample the possible image space as well as possible ( 11 ). However, during the application of CNNs in a routine setting, we recommend paying attention to the best possible quality, as the performance was directly linked to the tissue quality ( Figure 2 , Supplementary Figure S1 ).…”

Section: Discussionmentioning

confidence: 99%

“…It was not until 2021 that the first AI algorithm in computational pathology was approved by the Food and Drug Administration (FDA) ( 9 ). A major problem in overcoming this “translational valley of death” is to ensure the reproducibility and generalizability of the developed products ( 10 ) by defining appropriate test datasets ( 11 ). All of these applications and algorithms depend, of course, on the digitalized histomorphological whole slide image (WSI) used in training and during application.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

How to learn with intentional mistakes: NoisyEnsembles to overcome poor tissue quality for deep learning in computational pathology

et al. 2022

View full text Add to dashboard Cite

There is a lot of recent interest in the field of computational pathology, as many algorithms are introduced to detect, for example, cancer lesions or molecular features. However, there is a large gap between artificial intelligence (AI) technology and practice, since only a small fraction of the applications is used in routine diagnostics. The main problems are the transferability of convolutional neural network (CNN) models to data from other sources and the identification of uncertain predictions. The role of tissue quality itself is also largely unknown. Here, we demonstrated that samples of the TCGA ovarian cancer (TCGA-OV) dataset from different tissue sources have different quality characteristics and that CNN performance is linked to this property. CNNs performed best on high-quality data. Quality control tools were partially able to identify low-quality tiles, but their use did not increase the performance of the trained CNNs. Furthermore, we trained NoisyEnsembles by introducing label noise during training. These NoisyEnsembles could improve CNN performance for low-quality, unknown datasets. Moreover, the performance increases as the ensemble become more consistent, suggesting that incorrect predictions could be discarded efficiently to avoid wrong diagnostic decisions.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

How to learn with intentional mistakes: NoisyEnsembles to overcome poor tissue quality for deep learning in computational pathology

et al. 2022

View full text Add to dashboard Cite

show abstract

“…The data are frequently labelled, either by labelling a WSI with the diagnosis or smaller areas within the WSI, giving information about the region of interest 3 . The algorithm's performance is compared to a defined reference (usually a pathologist's diagnosis) in terms of performance 15 …”

Section: Why Do Errors Arise In Ai Diagnostic Tools?mentioning

confidence: 99%

“…This creates bias and an increased risk of errors in these subgroups 9,31–33 . This can also cause the problem of hidden stratification; when an algorithm appears to perform well across the whole population, but is actually performing poorly in subsets not identified during training or testing, and this niche of poor performance goes undetected 11,12,15 . For example, an algorithm may generally be effective at lung cancer detection, but consistently miss a rare subtype 12 .…”

Section: Why Do Errors Arise In Ai Diagnostic Tools?mentioning

confidence: 99%

Why do errors arise in artificial intelligence diagnostic tools in histopathology and how can we minimize them?

Evans,

Snead

2023

Histopathology

View full text Add to dashboard Cite

Artificial intelligence (AI)‐based diagnostic tools can offer numerous benefits to the field of histopathology, including improved diagnostic accuracy, efficiency and productivity. As a result, such tools are likely to have an increasing role in routine practice. However, all AI tools are prone to errors, and these AI‐associated errors have been identified as a major risk in the introduction of AI into healthcare. The errors made by AI tools are different, in terms of both cause and nature, to the errors made by human pathologists. As highlighted by the National Institute for Health and Care Excellence, it is imperative that practising pathologists understand the potential limitations of AI tools, including the errors made. Pathologists are in a unique position to be gatekeepers of AI tool use, maximizing patient benefit while minimizing harm. Furthermore, their pathological knowledge is essential to understanding when, and why, errors have occurred and so to developing safer future algorithms. This paper summarises the literature on errors made by AI diagnostic tools in histopathology. These include erroneous errors, data concerns (data bias, hidden stratification, data imbalances, distributional shift, and lack of generalisability), reinforcement of outdated practices, unsafe failure mode, automation bias, and insensitivity to impact. Methods to reduce errors in both tool design and clinical use are discussed, and the practical roles for pathologists in error minimisation are highlighted. This aims to inform and empower pathologists to move safely through this seismic change in practice and help ensure that novel AI tools are adopted safely.

show abstract

“…Clinical validation is necessary for any SaMD ( Fraggetta et al, 2021 ) as determined by the manufacturer before (pre-market) and after (post-market) distribution to establish a relationship between verification and validation results of an algorithm and the clinical conditions of interest ( Carolan et al, 2022 ). Prior to routine use, it is important to evaluate solutions that automatically extract information from digital histology images, and their predictive performance ( Homeyer et al, 2022 ). Various other technical and business challenges must similarly be overcome to commercialize digital pathology solutions ( Kearney et al, 2021 ; Lujan et al, 2021 ).…”

Section: Regulatory Standardsmentioning

confidence: 99%

Companion diagnostic requirements for spatial biology using multiplex immunofluorescence and multispectral imaging

Locke

Hoyt

2023

Front. Mol. Biosci.

View full text Add to dashboard Cite

Immunohistochemistry has long been held as the gold standard for understanding the expression patterns of therapeutically relevant proteins to identify prognostic and predictive biomarkers. Patient selection for targeted therapy in oncology has successfully relied upon standard microscopy-based methodologies, such as single-marker brightfield chromogenic immunohistochemistry. As promising as these results are, the analysis of one protein, with few exceptions, no longer provides enough information to draw effective conclusions about the probability of treatment response. More multifaceted scientific queries have driven the development of high-throughput and high-order technologies to interrogate biomarker expression patterns and spatial interactions between cell phenotypes in the tumor microenvironment. Such multi-parameter data analysis has been historically reserved for technologies that lack the spatial context that is provided by immunohistochemistry. Over the past decade, technical developments in multiplex fluorescence immunohistochemistry and discoveries made with improving image data analysis platforms have highlighted the importance of spatial relationships between certain biomarkers in understanding a patient’s likelihood to respond to, typically, immune checkpoint inhibitors. At the same time, personalized medicine has instigated changes in both clinical trial design and its conduct in a push to make drug development and cancer treatment more efficient, precise, and economical. Precision medicine in immuno-oncology is being steered by data-driven approaches to gain insight into the tumor and its dynamic interaction with the immune system. This is particularly necessary given the rapid growth in the number of trials involving more than one immune checkpoint drug, and/or using those in combination with conventional cancer treatments. As multiplex methods, like immunofluorescence, push the boundaries of immunohistochemistry, it becomes critical to understand the foundation of this technology and how it can be deployed for use as a regulated test to identify the prospect of response from mono- and combination therapies. To that end, this work will focus on: 1) the scientific, clinical, and economic requirements for developing clinical multiplex immunofluorescence assays; 2) the attributes of the Akoya Phenoptics workflow to support predictive tests, including design principles, verification, and validation needs; 3) regulatory, safety and quality considerations; 4) application of multiplex immunohistochemistry through lab-developed-tests and regulated in vitro diagnostic devices.

show abstract

Recommendations on compiling test datasets for evaluating artificial intelligence solutions in pathology

Cited by 35 publications

References 136 publications

How to learn with intentional mistakes: NoisyEnsembles to overcome poor tissue quality for deep learning in computational pathology

How to learn with intentional mistakes: NoisyEnsembles to overcome poor tissue quality for deep learning in computational pathology

Why do errors arise in artificial intelligence diagnostic tools in histopathology and how can we minimize them?

Companion diagnostic requirements for spatial biology using multiplex immunofluorescence and multispectral imaging

Contact Info

Product

Resources

About