Medical visual question answering: A survey

Lin, Zhihong; Zhang, Donghao; Tao, Qingyi; Shi, Danli; Haffari, Gholamreza; Wu, Qi; He, Miao; Ge, Zongyuan

doi:10.1016/j.artmed.2023.102611

Cited by 23 publications

(6 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Traditional machine learning methods, such as linear regression, support vector machines [ 144 ], and tree-based models [ 145 146 ], are explainable models. Some algorithms provide textual explanations directly, such as medical visual question answering [ 147 ]. However, most researchers in medical DL favor visual explanations such as class activation maps (CAMs) [ 148 ] and gradient-weighted CAM (Grad-CAM) [ 149 ].…”

Section: Overcoming the Challengesmentioning

confidence: 99%

Overcoming the Challenges in the Development and Implementation of Artificial Intelligence in Radiology: A Comprehensive Review of Solutions Beyond Supervised Learning

Hong,

Jang,

Kyung

et al. 2023

Korean J Radiol

View full text Add to dashboard Cite

Artificial intelligence (AI) in radiology is a rapidly developing field with several prospective clinical studies demonstrating its benefits in clinical practice. In 2022, the Korean Society of Radiology held a forum to discuss the challenges and drawbacks in AI development and implementation. Various barriers hinder the successful application and widespread adoption of AI in radiology, such as limited annotated data, data privacy and security, data heterogeneity, imbalanced data, model interpretability, overfitting, and integration with clinical workflows. In this review, some of the various possible solutions to these challenges are presented and discussed; these include training with longitudinal and multimodal datasets, dense training with multitask learning and multimodal learning, self-supervised contrastive learning, various image modifications and syntheses using generative models, explainable AI, causal learning, federated learning with large data models, and digital twins.

show abstract

Section: Overcoming the Challengesmentioning

confidence: 99%

Overcoming the Challenges in the Development and Implementation of Artificial Intelligence in Radiology: A Comprehensive Review of Solutions Beyond Supervised Learning

Hong,

Jang,

Kyung

et al. 2023

Korean J Radiol

View full text Add to dashboard Cite

show abstract

“…Few surveys have been developed about VQA, the different approaches to solving this task and developing new data to complement existing benchmarks. The majority of them focuses on providing a taxonomic structure for models and datasets applied, with some regarding specific sub-areas of VQA [1] while others tend to a more general description of the subject [2,3,4,5]. A general comparison is provided in this section, considering the present work and other surveys in the field, following the research premises established in section 1.2.…”

Section: Comparison With Other Workmentioning

confidence: 99%

“…The VQA task consists of accurately answering an image-question pair (I, q) based on characteristics of both I and q. Although formulations of this task have appeared prior to 2015 as a collision between image recognition, natural language processing and knowledge representation, it gained notoriety with the publication of [37] and the release of VQA v1 1 , an open-access dataset with, as of April 2017, 204, 721 images extracted from MS-COCO [38], 1, 105, 904 free-form and open-ended questions and 11, 059, 040 ground-truth answers [37]. However, since the formulation of the task, there has been a significant increase of published datasets with varying degrees of difficulty and image-question distribution balancing [39,28,15,40], with some attaining to a specific domain, e.g.…”

Section: Visual Question Answering (Vqa)mentioning

confidence: 99%

Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature

Sants,

Bastos,

de Faria

et al. 2023

Preprint

View full text Add to dashboard Cite

Visual Question Answering (VQA) is an emerging area of interest for researches, being a recent problem in natural language processing and image prediction. In this area, an algorithm needs to answer questions about certain images. As of the writing of this survey, 25 recent studies were analyzed. Besides, 6 datasets were analyzed and provided their link to download. In this work, several recent pieces of research in this area were investigated and a deeper analysis and comparison among them were provided, including results, the state-of-the-art, common errors, and possible points of improvement for future researchers.

show abstract

“…[6] However, most existing LLMs still have limitations in handling medical fields involving image content. [7] The recent introduction of GPT-4V(ision) has provided a new tool for the medical field. [8] GPT-4V is a multimodal generalist LLM that can process both images and text, enabling various downstream tasks, including visual question answering (VQA).…”

Section: Introductionmentioning

confidence: 99%

“…[6] However, most existing LLMs still have limitations in handling medical fields involving image content. [7]…”

Section: Introductionmentioning

confidence: 99%

Unveiling the Clinical Incapabilities: A Benchmarking Study of GPT-4V(ision) for Ophthalmic Multimodal Image Analysis

Xu,

Chen,

Zhao

et al. 2023

Preprint

Self Cite

View full text Add to dashboard Cite

BackgroundsGPT4-V(ision) has generated great interest across various fields, while its performance in ocular multimodal images is still unknown. This study aims to evaluate the capabilities of a GPT-4V-based chatbot in addressing queries related to ocular multimodal images.MethodsA digital ophthalmologist app was built based on GPT-4V. The evaluation dataset comprised various ocular imaging modalities: slit-lamp, scanning laser ophthalmoscopy (SLO), fundus photography of the posterior pole (FPP), optical coherence tomography (OCT), fundus fluorescein angiography (FFA), and ocular ultrasound (OUS). Each modality included images representing 5 common and 5 rare diseases. The chatbot was presented with ten questions per image, focusing on examination identification, lesion detection, diagnosis, decision support, and the repeatability of diagnosis. The responses of GPT-4V were evaluated based on accuracy, usability, and safety.ResultsThere was a substantial agreement among three ophthalmologists. Out of 600 responses, 30.5% were accurate, 22.8% of 540 responses were highly usable, and 55.5% of 540 responses were considered safe by ophthalmologists. The chatbot excelled in interpreting slit-lamp images, with 42.0%, 42.2%, and 68.5% of the responses being accurate, highly usable, and no harm, respectively. However, its performance was notably weaker in FPP images, with only 13.7%, 3.7%, and 38.5% in the same categories. It correctly identified 95.6% of the imaging modalities. For lesion identification, diagnosis, and decision support, the chatbot’s accuracy was 25.6%, 16.1%, and 24.0%, respectively. The average proportions of correct answers, highly usable, and no harm for GPT-4V in common diseases were 37.9%, 30.5%, and 60.1%, respectively. These proportions were all higher compared to those in rare diseases, which were 23.2% (P<0.001), 15.2% (P<0.001), and 51.1% (P=0.032), respectively. The overall repeatability of GPT4-V in diagnosing ocular images was 63% (38/60).ConclusionCurrently, GPT-4V lacks the reliability required for clinical decision-making and patient consultation in ophthalmology. Ongoing refinement and testing are essential for improving the efficacy of large language models in medical applications.

show abstract

Medical visual question answering: A survey

Cited by 23 publications

References 27 publications

Overcoming the Challenges in the Development and Implementation of Artificial Intelligence in Radiology: A Comprehensive Review of Solutions Beyond Supervised Learning

Overcoming the Challenges in the Development and Implementation of Artificial Intelligence in Radiology: A Comprehensive Review of Solutions Beyond Supervised Learning

Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature

Unveiling the Clinical Incapabilities: A Benchmarking Study of GPT-4V(ision) for Ophthalmic Multimodal Image Analysis

Contact Info

Product

Resources

About