AI Slipping on Tiles: Data Leakage in Digital Pathology

Bussola, Nicole; Marcolini, Alessia; Maggio, Valerio; Jurman, Giuseppe; Furlanello, Cesare

doi:10.1007/978-3-030-68763-2_13

Cited by 23 publications

(29 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…If tiles are randomly assigned, tiles from the same WSI can end up in both the development and the test datasets, possibly in ating performance results. A substantial number of published research studies are a ected by this problem [110]. Therefore, to avoid any risk of bias, none of the tiles in a test dataset may originate from the same WSI as the tiles in the development set [110].…”

Section: Independencementioning

confidence: 99%

Recommendations on test datasets for evaluating AI solutions in pathology

Homeyer,

Geißler,

Schwen

et al. 2022

Preprint

View full text Add to dashboard Cite

Arti cial intelligence (AI) solutions that automatically extract information from digital histology images have shown great promise for improving pathological diagnosis. Prior to routine use, it is important to evaluate their predictive performance and obtain regulatory approval. This assessment requires appropriate test datasets. However, compiling such datasets is challenging and speci c recommendations are missing.A committee of various stakeholders, including commercial AI developers, pathologists, and researchers, discussed key aspects and conducted extensive literature reviews on test datasets in pathology. Here, we summarize the results and derive general recommendations for the collection of test datasets.We address several questions: Which and how many images are needed? How to deal with low-prevalence subsets? How can potential bias be detected? How should datasets be reported? What are the regulatory requirements in di erent countries?The recommendations are intended to help AI developers demonstrate the utility of their products and to help regulatory agencies and end users verify reported performance measures. Further research is needed to formulate criteria for su ciently representative test datasets so that AI solutions can operate with less user intervention and better support diagnostic work ows in the future.

show abstract

Section: Independencementioning

confidence: 99%

Recommendations on test datasets for evaluating AI solutions in pathology

Homeyer,

Geißler,

Schwen

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…To be able to claim that one of the trained models can be considered production ready, the aforementioned optimization processes are not sufficient. There is at least one important factor that could potentially introduce bias to the trained models and that is data leakage, as it is well described by Bussola et al [52]. The final process of this methodology focuses on solving that issue.…”

Section: Production Model Creationmentioning

confidence: 99%

Mild Cognitive Impairment Detection Using Machine Learning Models Trained on Data Collected from Serious Games

Karapapas

Goumopoulos

2021

Applied Sciences

View full text Add to dashboard Cite

Mild cognitive impairment (MCI) is an indicative precursor of Alzheimer’s disease and its early detection is critical to restrain further cognitive deterioration through preventive measures. In this context, the capacity of serious games combined with machine learning for MCI detection is examined. In particular, a custom methodology is proposed, which consists of a series of steps to train and evaluate classification models that could discriminate healthy from cognitive impaired individuals on the basis of game performance and other subjective data. Such data were collected during a pilot evaluation study of a gaming platform, called COGNIPLAT, with 10 seniors. An exploratory analysis of the data is performed to assess feature selection, model overfitting, optimization techniques and classification performance using several machine learning algorithms and standard evaluation metrics. A production level model is also trained to deal with the issue of data leakage while delivering a high detection performance (92.14% accuracy, 93.4% sensitivity and 90% specificity) based on the Gaussian Naive Bayes classifier. This preliminary study provides initial evidence that serious games combined with machine learning methods could potentially serve as a complementary or an alternative tool to the traditional cognitive screening processes.

show abstract

“…A similar strategy is also adopted in [34], with the further addition of an attention mechanism. Working with tiles, however, requires careful planning of the model training, not to incur in unwanted biases such as the data (or information) leakage: whenever tiles are extracted from the same WSI in both the training and the validation set, model results are heavily affected by overfitting [35].…”

Section: Digital Pathology and Artificial Intelligencementioning

confidence: 99%

“…Metrics are reported indicating average and standard deviation. Moreover, throughout the model training a particular care has been devoted into avoiding overfitting effects such as data (or information) leakage [35]: tiles extracted from the same WSI were not distributed in different training/test data subsets, a careful approach which is now becoming standard in the most recent works being published [131]. Finally, we adopted a plateau learning rate scheduler acted by monitoring metrics on validation set and reducing the learning rate if no improvements occurred for at least ten epochs: the new learning rate was computed as η t+1 = αη t with α = 0.2.…”

Section: Eunet Training and Evaluationmentioning

confidence: 99%

Quantification of the Immune Content in Neuroblastoma: Deep Learning and Topological Data Analysis in Digital Pathology

Bussola

Papa

Castellano

et al. 2021

IJMS

Self Cite

View full text Add to dashboard Cite

We introduce here a novel machine learning (ML) framework to address the issue of the quantitative assessment of the immune content in neuroblastoma (NB) specimens. First, the EUNet, a U-Net with an EfficientNet encoder, is trained to detect lymphocytes on tissue digital slides stained with the CD3 T-cell marker. The training set consists of 3782 images extracted from an original collection of 54 whole slide images (WSIs), manually annotated for a total of 73,751 lymphocytes. Resampling strategies, data augmentation, and transfer learning approaches are adopted to warrant reproducibility and to reduce the risk of overfitting and selection bias. Topological data analysis (TDA) is then used to define activation maps from different layers of the neural network at different stages of the training process, described by persistence diagrams (PD) and Betti curves. TDA is further integrated with the uniform manifold approximation and projection (UMAP) dimensionality reduction and the hierarchical density-based spatial clustering of applications with noise (HDBSCAN) algorithm for clustering, by the deep features, the relevant subgroups and structures, across different levels of the neural network. Finally, the recent TwoNN approach is leveraged to study the variation of the intrinsic dimensionality of the U-Net model. As the main task, the proposed pipeline is employed to evaluate the density of lymphocytes over the whole tissue area of the WSIs. The model achieves good results with mean absolute error 3.1 on test set, showing significant agreement between densities estimated by our EUNet model and by trained pathologists, thus indicating the potentialities of a promising new strategy in the quantification of the immune content in NB specimens. Moreover, the UMAP algorithm unveiled interesting patterns compatible with pathological characteristics, also highlighting novel insights into the dynamics of the intrinsic dataset dimensionality at different stages of the training process. All the experiments were run on the Microsoft Azure cloud platform.

show abstract

AI Slipping on Tiles: Data Leakage in Digital Pathology

Cited by 23 publications

References 32 publications

Recommendations on test datasets for evaluating AI solutions in pathology

Recommendations on test datasets for evaluating AI solutions in pathology

Mild Cognitive Impairment Detection Using Machine Learning Models Trained on Data Collected from Serious Games

Quantification of the Immune Content in Neuroblastoma: Deep Learning and Topological Data Analysis in Digital Pathology

Contact Info

Product

Resources

About