Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-short.90
|View full text |Cite
|
Sign up to set email alerts
|

Towards Visual Question Answering on Pathology Images

Abstract: Pathology imaging is broadly used for identifying the causes and effects of diseases or injuries. Given a pathology image, being able to answer questions about the clinical findings contained in the image is very important for medical decision making. In this paper, we aim to develop a pathological visual question answering framework to analyze pathology images and answer medical questions related to these images. To build such a framework, we create PathVQA, a pathology VQA dataset with 32,795 questions asked… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(6 citation statements)
references
References 32 publications
(38 reference statements)
0
5
0
Order By: Relevance
“…Modality Source Images QA pairs VQA-RAD [18] Radiology MedPix ® database 0.3k 3.5k PathVQA [12] Pathology PEIR Digital Library [14] 5k 32.8k SLAKE [23] Radiology MSD [3], ChestX-ray8 [36], CHAOS [15] 0.7k 14k VQA-Med-2021 [5] Radiology MedPix ® database 5k 5k…”
Section: Datasetmentioning
confidence: 99%
“…Modality Source Images QA pairs VQA-RAD [18] Radiology MedPix ® database 0.3k 3.5k PathVQA [12] Pathology PEIR Digital Library [14] 5k 32.8k SLAKE [23] Radiology MSD [3], ChestX-ray8 [36], CHAOS [15] 0.7k 14k VQA-Med-2021 [5] Radiology MedPix ® database 5k 5k…”
Section: Datasetmentioning
confidence: 99%
“…Finally, the development of the first pathology-specific VQA system (He 2021 ) showcased an innovative three-level optimization framework, setting new frontiers in cross-modal self-supervised pretraining and finetuning for pathology. This research introduced a three-level optimization framework for VQA on the PathVQA dataset, including self-supervised pretraining, VQA finetuning, and model validation stages.…”
Section: Language Models For Medical Imagingmentioning
confidence: 99%
“…The model uses a learning-by-ignoring method to remove problematic training samples. In [ 48 ], an encoder–decoder architecture with a three-level optimization framework that relies on cross-modal self-supervised learning methods was developed to improve performance. Sharma et al [ 49 ] proposed a model based on ResNet and BERT models with attention modules to focus on the relevant part of the medical images and questions.…”
Section: Related Workmentioning
confidence: 99%