2024
DOI: 10.1038/s41591-024-02856-4
|View full text |Cite
|
Sign up to set email alerts
|

A visual-language foundation model for computational pathology

Ming Y. Lu,
Bowen Chen,
Drew F. K. Williamson
et al.
Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 18 publications
(1 citation statement)
references
References 68 publications
0
1
0
Order By: Relevance
“…The EchoCLIP model uses a ConvNeXt-Base 26 image encoder and a Byte-Pair Encoding text tokenizer 27 . The text encoder architecture is a decoder-only transformer identical to the architecture used by the original CLIP paper 23 and has an input context length of 77 tokens. Despite not being directly trained on specific interpretation tasks, EchoCLIP can accurately identify implanted devices as well as assess cardiac form and function (Table 2 ).…”
Section: Resultsmentioning
confidence: 99%
“…The EchoCLIP model uses a ConvNeXt-Base 26 image encoder and a Byte-Pair Encoding text tokenizer 27 . The text encoder architecture is a decoder-only transformer identical to the architecture used by the original CLIP paper 23 and has an input context length of 77 tokens. Despite not being directly trained on specific interpretation tasks, EchoCLIP can accurately identify implanted devices as well as assess cardiac form and function (Table 2 ).…”
Section: Resultsmentioning
confidence: 99%