2023
DOI: 10.1101/2023.03.29.534834
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Leveraging medical Twitter to build a visual–language foundation model for pathology AI

Abstract: The lack of annotated publicly available medical images is a major barrier for innovations. At the same time, many de-identified images and much knowledge are shared by clinicians on public forums such as medical Twitter. Here we harness these crowd platforms to curate OpenPath, a large dataset of 208,414 pathology images paired with natural language descriptions. This is the largest public dataset for pathology images annotated with natural text. We demonstrate the value of this resource by developing PLIP, a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 49 publications
(62 reference statements)
0
1
0
Order By: Relevance
“…Though no foundational model has been trained that focuses solely on neuropathology, several general-purpose digital pathology models have been introduced. Huang et al developed a visual–language model based on the OpenPath dataset (208,414 image pairs), which originated from medical twitter [ 49 ]. The model was able to retrieve relevant images from text input.…”
Section: Semi-supervised and Self-supervised Machine Learningmentioning
confidence: 99%
“…Though no foundational model has been trained that focuses solely on neuropathology, several general-purpose digital pathology models have been introduced. Huang et al developed a visual–language model based on the OpenPath dataset (208,414 image pairs), which originated from medical twitter [ 49 ]. The model was able to retrieve relevant images from text input.…”
Section: Semi-supervised and Self-supervised Machine Learningmentioning
confidence: 99%