2022
DOI: 10.48550/arxiv.2203.00585
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology

Abstract: Tissue phenotyping is a fundamental task in learning objective characterizations of histopathologic biomarkers within the tumor-immune microenvironment in cancer pathology. However, whole-slide imaging (WSI) is a complex computer vision in which: 1) WSIs have enormous image resolutions with precludes large-scale pixel-level efforts in data curation, and 2) diversity of morphological phenotypes results in inter-and intra-observer variability in tissue labeling. To address these limitations, current efforts have… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
20
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 13 publications
(23 citation statements)
references
References 28 publications
1
20
0
Order By: Relevance
“…We also evaluated the patch embedder performance of the ViT-S-16 and ViT-L-16 models when they are self-supervised by DINO 42 instead of pretrained on the ImageNet as used in Chen and Krishnan. 43 We further compared the performance of the above weakly supervised methods with non-weakly supervised methods by training a patch-level version of the patch embedders directly for classification using the slide-level labels as ground truth.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…We also evaluated the patch embedder performance of the ViT-S-16 and ViT-L-16 models when they are self-supervised by DINO 42 instead of pretrained on the ImageNet as used in Chen and Krishnan. 43 We further compared the performance of the above weakly supervised methods with non-weakly supervised methods by training a patch-level version of the patch embedders directly for classification using the slide-level labels as ground truth.…”
Section: Resultsmentioning
confidence: 99%
“… 10 0.860∗ ±0.023 0.745∗ ± 0.024 GNN (non-weakly-supervised) N/A Jaume et al. 41 0.832 ± 0.027 0.723 ± 0.025 ViT (ViT-S-16, DINO) ViT-WSI aggregator w/graph Chen and Krishnan 43 0.940 ± 0.004 0.870 ± 0.012 ViT (ViT-L-16, DINO) ViT-WSI aggregator w/graph 0.942∗ ±0.027 0.872 ± 0.031 Human Performance: Macro FPR = 0.0901 ± 0.078, Macro TPR = 0.9557 ± 0.042 11-class subtyping ResNet50 (non-weakly-supervised) N/A 0.745 ± 0.022 0.414 ± 0.029 ResNet50 CLAM-MB Lu et al. 15 0.845 ± 0.021 0.536 ± 0.027 ResNet50 ViT-WSI aggregator w/graph 0.873∗ ±0.017 0.556∗ ± 0.031 ViT (ViT-L-16, non-weakly-supervised) N/A 0.753 ± 0.027 0.425 ± 0.045 ViT (ViT-L-16) Max Pooling 0.837 ± 0.019 0.480 ± 0.031 ViT (ViT-L-16) CLAM-MB 0.860 ± 0.022 0.551 ± 0.041 ViT (ViT-L-16) ViT-WSI aggregator w/graph 0.887∗ ±0.024 0.563∗ ± 0.030 Inception v3 (non-weakly-supervised) N/A Coudray et al.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Self-supervised learning (SSL) has proven very effective for label-efficient fine-tuning in natural image classification (Chen et al, 2020;He et al, 2020), video classification (Diba et al, 2021;Kuang et al, 2021), and now even medical image classification and segmentation tasks (Azizi et al, 2021;Taleb et al, 2020;Tang et al, 2021). However, most successful medical applications of SSL operate on 2D data such as histopathological images and radiographs (Chen & Krishnan, 2022;Wang et al, 2021). Some recent studies have developed SSL methods for 3D medical image data, though this has been applied to CT and MRI, where this third dimension is spatial, not temporal (Tang et al, 2021;Taleb et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…Moreover, vision transformers are more robust in multi-task learning [42]. For WSIs, vision transformers perform better in capturing fine-grained morphological features, such as cells and background tissue [43]. Visualization methods have filled a significant gap in neural network understanding in computer vision.…”
Section: Introductionmentioning
confidence: 99%