2021
DOI: 10.2196/23436
|View full text |Cite
|
Sign up to set email alerts
|

Hidden Variables in Deep Learning Digital Pathology and Their Potential to Cause Batch Effects: Prediction Model Study

Abstract: Background An increasing number of studies within digital pathology show the potential of artificial intelligence (AI) to diagnose cancer using histological whole slide images, which requires large and diverse data sets. While diversification may result in more generalizable AI-based systems, it can also introduce hidden variables. If neural networks are able to distinguish/learn hidden variables, these variables can introduce batch effects that compromise the accuracy of classification systems. … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
48
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1

Relationship

2
7

Authors

Journals

citations
Cited by 42 publications
(55 citation statements)
references
References 23 publications
(14 reference statements)
1
48
0
1
Order By: Relevance
“…While CNN-based image analysis has advantages over human observation with respect to objective and quantitative feature extraction, an obvious drawback is that in contrast to human experts, CNNs have difficulty distinguishing biologically significant features from insignificant features and artifacts. Depending on the data set that is used for CNN training, spurious and unwanted correlations within the training set can be picked up and hamper generalization [19,20,46]. Moreover, deceptively created input images specifically designed to fool a CNN (adversarial attacks) have been shown to pose a real threat [21].…”
Section: Introductionmentioning
confidence: 99%
“…While CNN-based image analysis has advantages over human observation with respect to objective and quantitative feature extraction, an obvious drawback is that in contrast to human experts, CNNs have difficulty distinguishing biologically significant features from insignificant features and artifacts. Depending on the data set that is used for CNN training, spurious and unwanted correlations within the training set can be picked up and hamper generalization [19,20,46]. Moreover, deceptively created input images specifically designed to fool a CNN (adversarial attacks) have been shown to pose a real threat [21].…”
Section: Introductionmentioning
confidence: 99%
“…Various quality control (QC) techniques can be used to overcome preanalytical issues such as variations in slide preparation, origin, and scanner type. One approach is to train individual models of the same architecture to recognize specific variables 73 . Other approaches, such as combining image metrics in a QC application 138 , or transformation of image patches with synthetically generated artifacts 139 , can be used to train an algorithm to recognize different types of histological artifacts.…”
Section: Adoption Of Digital Pathology and Ai: Challenges And Future Considerationsmentioning
confidence: 99%
“…Data used for training need to be accurate and as complete as possible in order to maximize predictability and utility 39 . This can be challenging when histological data are obtained from various laboratories, leading to some variability due to factors such as differences in slide preparation (sectioning, fixation, staining, and mounting) 73 , scoring algorithms 18 , and inherent inter-observer variability 74 . These challenges become more apparent when more complex computational analytics methods are used for multiplexed imaging.…”
Section: Advances In Computational Approaches: Ai and Machine Learningmentioning
confidence: 99%
“…Moreover, a separate analysis of the subgroup with T1 tumours would have been very interesting but was not feasible because of the small number of T1 cases. As in all studies using a CNN to extract features from histological slides, hidden variables such as specimen fixation, slide preparation date, slide origin or scanner type in the digital images might cause batch effects and influence prediction accuracy [35]. Finally, we used WSIs of the operation specimens.…”
Section: Strengths and Limitations Of The Present Studymentioning
confidence: 99%