The Challenge Dataset – simple evaluation for safe, transparent healthcare AI deployment

Sanayei, James K; Abdalla, Mohamed; Ahluwalia, Monish; Seyyed-Kalantari, Laleh; Minotti, Simona C.; Fine, Benjamin

doi:10.1101/2022.12.15.22280619

Cited by 3 publications

(2 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Typically in the development of AI diagnostic tools, the model is first trained on one dataset, referred to as the training dataset. The model is then subsequently tested on a second testing dataset, comprising data unseen during the training stage for external validation 14 . It is well understood that these two developmental datasets should be different in order to assess the algorithm's performance on unseen data, and ideally some of the test dataset should be from external sources.…”

Section: Why Do Errors Arise In Ai Diagnostic Tools?mentioning

confidence: 99%

Why do errors arise in artificial intelligence diagnostic tools in histopathology and how can we minimize them?

Evans,

Snead

2023

Histopathology

View full text Add to dashboard Cite

Artificial intelligence (AI)‐based diagnostic tools can offer numerous benefits to the field of histopathology, including improved diagnostic accuracy, efficiency and productivity. As a result, such tools are likely to have an increasing role in routine practice. However, all AI tools are prone to errors, and these AI‐associated errors have been identified as a major risk in the introduction of AI into healthcare. The errors made by AI tools are different, in terms of both cause and nature, to the errors made by human pathologists. As highlighted by the National Institute for Health and Care Excellence, it is imperative that practising pathologists understand the potential limitations of AI tools, including the errors made. Pathologists are in a unique position to be gatekeepers of AI tool use, maximizing patient benefit while minimizing harm. Furthermore, their pathological knowledge is essential to understanding when, and why, errors have occurred and so to developing safer future algorithms. This paper summarises the literature on errors made by AI diagnostic tools in histopathology. These include erroneous errors, data concerns (data bias, hidden stratification, data imbalances, distributional shift, and lack of generalisability), reinforcement of outdated practices, unsafe failure mode, automation bias, and insensitivity to impact. Methods to reduce errors in both tool design and clinical use are discussed, and the practical roles for pathologists in error minimisation are highlighted. This aims to inform and empower pathologists to move safely through this seismic change in practice and help ensure that novel AI tools are adopted safely.

show abstract

Section: Why Do Errors Arise In Ai Diagnostic Tools?mentioning

confidence: 99%

Why do errors arise in artificial intelligence diagnostic tools in histopathology and how can we minimize them?

Evans,

Snead

2023

Histopathology

View full text Add to dashboard Cite

show abstract

“…An edge case is a specific scenario that occurs only at an extreme (usually the maximum or minimum) of operating parameter—the computer science equivalent of a ‘rare presentation’. By assessing the performance or understanding the output of the method or algorithm to ‘rare presentations’, developers begin to understand the limitations of the aforementioned methods/algorithms and can begin to improve them 1. Thus, while upper gastrointestinal (GI) bleeding is common (annual incidence of 1 in 1000),2 exploring the diagnostic odyssey of a case study of haemosuccus pancreaticus (a rare cause of bleeding, <1% of upper GI bleeds)3 reveals both the strengths and weakness of current diagnostic approaches to GI bleeding management 4–6.…”

Section: Introductionmentioning

confidence: 99%

Haemosuccus pancreaticus and seven episodes of recurrent unlocalised upper gastrointestinal bleeding

Abdalla,

Panda

et al. 2024

BMJ Case Rep

View full text Add to dashboard Cite

Upper gastrointestinal (GI) bleeding is a common medical condition that results in extensive morbidity and mortality, as well as substantial healthcare costs. While there is variation among society and consensus guidelines, the approaches to assessment and evaluation are generally consistent. Our case describes a man in his 40s who presented with seven episodes of recurrent upper GI bleeding over 2 years secondary to haemosuccus pancreaticus. While rare, this case study highlights key principles to the initial diagnostic approach that, in appropriate clinical contexts, should be applied to patients with unlocalised upper GI bleeding. We further perform a complete systematic review of similar cases available in PubMed (36 patients in 24 case reports) to further refine these diagnostic principles.

show abstract

Empirical data drift detection experiments on real-world medical imaging data

Kore,

Abbasi Bavil,

Subasri

et al. 2024

Nat Commun

View full text Add to dashboard Cite

While it is common to monitor deployed clinical artificial intelligence (AI) models for performance degradation, it is less common for the input data to be monitored for data drift – systemic changes to input distributions. However, when real-time evaluation may not be practical (eg., labeling costs) or when gold-labels are automatically generated, we argue that tracking data drift becomes a vital addition for AI deployments. In this work, we perform empirical experiments on real-world medical imaging to evaluate three data drift detection methods’ ability to detect data drift caused (a) naturally (emergence of COVID-19 in X-rays) and (b) synthetically. We find that monitoring performance alone is not a good proxy for detecting data drift and that drift-detection heavily depends on sample size and patient features. Our work discusses the need and utility of data drift detection in various scenarios and highlights gaps in knowledge for the practical application of existing methods.

show abstract

The Challenge Dataset – simple evaluation for safe, transparent healthcare AI deployment

Cited by 3 publications

References 35 publications

Why do errors arise in artificial intelligence diagnostic tools in histopathology and how can we minimize them?

Why do errors arise in artificial intelligence diagnostic tools in histopathology and how can we minimize them?

Haemosuccus pancreaticus and seven episodes of recurrent unlocalised upper gastrointestinal bleeding

Empirical data drift detection experiments on real-world medical imaging data

Contact Info

Product

Resources

About