2022
DOI: 10.1101/2022.12.15.22280619
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Challenge Dataset – simple evaluation for safe, transparent healthcare AI deployment

Abstract: In this paper, we demonstrate the use of a "Challenge Dataset": a small, site-specific, manually curated dataset - enriched with uncommon, risk-exposing, and clinically important edge cases - that can facilitate pre-deployment evaluation and identification of clinically relevant AI performance deficits. The five major steps of the Challenge Dataset process are described in detail, including defining use cases, edge case selection, dataset size determination, dataset compilation, and model evaluation. Evaluatin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 35 publications
0
2
0
Order By: Relevance
“…Typically in the development of AI diagnostic tools, the model is first trained on one dataset, referred to as the training dataset. The model is then subsequently tested on a second testing dataset, comprising data unseen during the training stage for external validation 14 . It is well understood that these two developmental datasets should be different in order to assess the algorithm's performance on unseen data, and ideally some of the test dataset should be from external sources.…”
Section: Why Do Errors Arise In Ai Diagnostic Tools?mentioning
confidence: 99%
“…Typically in the development of AI diagnostic tools, the model is first trained on one dataset, referred to as the training dataset. The model is then subsequently tested on a second testing dataset, comprising data unseen during the training stage for external validation 14 . It is well understood that these two developmental datasets should be different in order to assess the algorithm's performance on unseen data, and ideally some of the test dataset should be from external sources.…”
Section: Why Do Errors Arise In Ai Diagnostic Tools?mentioning
confidence: 99%
“…An edge case is a specific scenario that occurs only at an extreme (usually the maximum or minimum) of operating parameter—the computer science equivalent of a ‘rare presentation’. By assessing the performance or understanding the output of the method or algorithm to ‘rare presentations’, developers begin to understand the limitations of the aforementioned methods/algorithms and can begin to improve them 1. Thus, while upper gastrointestinal (GI) bleeding is common (annual incidence of 1 in 1000),2 exploring the diagnostic odyssey of a case study of haemosuccus pancreaticus (a rare cause of bleeding, <1% of upper GI bleeds)3 reveals both the strengths and weakness of current diagnostic approaches to GI bleeding management 4–6.…”
Section: Introductionmentioning
confidence: 99%