2021
DOI: 10.1007/978-3-030-68763-2_13
|View full text |Cite
|
Sign up to set email alerts
|

AI Slipping on Tiles: Data Leakage in Digital Pathology

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
29
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2
1

Relationship

2
7

Authors

Journals

citations
Cited by 23 publications
(29 citation statements)
references
References 32 publications
0
29
0
Order By: Relevance
“…If tiles are randomly assigned, tiles from the same WSI can end up in both the development and the test datasets, possibly in ating performance results. A substantial number of published research studies are a ected by this problem [110]. Therefore, to avoid any risk of bias, none of the tiles in a test dataset may originate from the same WSI as the tiles in the development set [110].…”
Section: Independencementioning
confidence: 99%
“…If tiles are randomly assigned, tiles from the same WSI can end up in both the development and the test datasets, possibly in ating performance results. A substantial number of published research studies are a ected by this problem [110]. Therefore, to avoid any risk of bias, none of the tiles in a test dataset may originate from the same WSI as the tiles in the development set [110].…”
Section: Independencementioning
confidence: 99%
“…To be able to claim that one of the trained models can be considered production ready, the aforementioned optimization processes are not sufficient. There is at least one important factor that could potentially introduce bias to the trained models and that is data leakage, as it is well described by Bussola et al [52]. The final process of this methodology focuses on solving that issue.…”
Section: Production Model Creationmentioning
confidence: 99%
“…A similar strategy is also adopted in [34], with the further addition of an attention mechanism. Working with tiles, however, requires careful planning of the model training, not to incur in unwanted biases such as the data (or information) leakage: whenever tiles are extracted from the same WSI in both the training and the validation set, model results are heavily affected by overfitting [35].…”
Section: Digital Pathology and Artificial Intelligencementioning
confidence: 99%
“…Metrics are reported indicating average and standard deviation. Moreover, throughout the model training a particular care has been devoted into avoiding overfitting effects such as data (or information) leakage [35]: tiles extracted from the same WSI were not distributed in different training/test data subsets, a careful approach which is now becoming standard in the most recent works being published [131]. Finally, we adopted a plateau learning rate scheduler acted by monitoring metrics on validation set and reducing the learning rate if no improvements occurred for at least ten epochs: the new learning rate was computed as η t+1 = αη t with α = 0.2.…”
Section: Eunet Training and Evaluationmentioning
confidence: 99%