2023
DOI: 10.1016/j.artmed.2023.102611
|View full text |Cite
|
Sign up to set email alerts
|

Medical visual question answering: A survey

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 23 publications
(6 citation statements)
references
References 27 publications
0
5
0
Order By: Relevance
“…Traditional machine learning methods, such as linear regression, support vector machines [ 144 ], and tree-based models [ 145 146 ], are explainable models. Some algorithms provide textual explanations directly, such as medical visual question answering [ 147 ]. However, most researchers in medical DL favor visual explanations such as class activation maps (CAMs) [ 148 ] and gradient-weighted CAM (Grad-CAM) [ 149 ].…”
Section: Overcoming the Challengesmentioning
confidence: 99%
“…Traditional machine learning methods, such as linear regression, support vector machines [ 144 ], and tree-based models [ 145 146 ], are explainable models. Some algorithms provide textual explanations directly, such as medical visual question answering [ 147 ]. However, most researchers in medical DL favor visual explanations such as class activation maps (CAMs) [ 148 ] and gradient-weighted CAM (Grad-CAM) [ 149 ].…”
Section: Overcoming the Challengesmentioning
confidence: 99%
“…Few surveys have been developed about VQA, the different approaches to solving this task and developing new data to complement existing benchmarks. The majority of them focuses on providing a taxonomic structure for models and datasets applied, with some regarding specific sub-areas of VQA [1] while others tend to a more general description of the subject [2,3,4,5]. A general comparison is provided in this section, considering the present work and other surveys in the field, following the research premises established in section 1.2.…”
Section: Comparison With Other Workmentioning
confidence: 99%
“…The VQA task consists of accurately answering an image-question pair (I, q) based on characteristics of both I and q. Although formulations of this task have appeared prior to 2015 as a collision between image recognition, natural language processing and knowledge representation, it gained notoriety with the publication of [37] and the release of VQA v1 1 , an open-access dataset with, as of April 2017, 204, 721 images extracted from MS-COCO [38], 1, 105, 904 free-form and open-ended questions and 11, 059, 040 ground-truth answers [37]. However, since the formulation of the task, there has been a significant increase of published datasets with varying degrees of difficulty and image-question distribution balancing [39,28,15,40], with some attaining to a specific domain, e.g.…”
Section: Visual Question Answering (Vqa)mentioning
confidence: 99%
“…[6] However, most existing LLMs still have limitations in handling medical fields involving image content. [7] The recent introduction of GPT-4V(ision) has provided a new tool for the medical field. [8] GPT-4V is a multimodal generalist LLM that can process both images and text, enabling various downstream tasks, including visual question answering (VQA).…”
Section: Introductionmentioning
confidence: 99%
“…[6] However, most existing LLMs still have limitations in handling medical fields involving image content. [7]…”
Section: Introductionmentioning
confidence: 99%