Towards Visual Question Answering on Pathology Images

He, Xiahong; Cai, Zhuo; Wei, Wenlan; Zhang, Yichen; Mou, Luntian; Xing, Eric P.; Xie, Pengtao

doi:10.18653/v1/2021.acl-short.90

Cited by 16 publications

(6 citation statements)

References 32 publications

(38 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Modality Source Images QA pairs VQA-RAD [18] Radiology MedPix ® database 0.3k 3.5k PathVQA [12] Pathology PEIR Digital Library [14] 5k 32.8k SLAKE [23] Radiology MSD [3], ChestX-ray8 [36], CHAOS [15] 0.7k 14k VQA-Med-2021 [5] Radiology MedPix ® database 5k 5k…”

Section: Datasetmentioning

confidence: 99%

School of Mechanical Engineering, Shanghai Jiao Tong University, 200240, Shanghai, China

Zhuang

Huang

et al. 2019

Mathematical Biosciences and Engineering

View full text Add to dashboard Cite

The crystallization kinetics and melting behavior of nylon 10,10 in neat nylon 10,10 and in nylon 10,10 -montmorillonite (MMT) nanocomposites were systematically investigated by differential scanning calorimetry. The crystallization kinetics results show that the addition of MMT facilitated the crystallization of nylon 10,10 as a heterophase nucleating agent; however, when the content of MMT was high, the physical hindrance of MMT layers to the motion of nylon 10,10 chains retarded the crystallization of nylon 10,10, which was also confirmed by polarized optical microscopy. However, both nylon 10,10 and nylon 10,10 -MMT nanocomposites exhibited multiple melting be-havior under isothermal and nonisothermal crystallization conditions. The temperature of the lower melting peak (peak I) was independent of MMT content and almost remained constant; however, the temperature of the highest melting peak (peak II) decreased with increasing MMT content due to the physical hindrance of MMT layers to the motion of nylon 10,10 chains.

show abstract

Section: Datasetmentioning

confidence: 99%

School of Mechanical Engineering, Shanghai Jiao Tong University, 200240, Shanghai, China

Zhuang

Huang

et al. 2019

Mathematical Biosciences and Engineering

View full text Add to dashboard Cite

show abstract

“…Finally, the development of the first pathology-specific VQA system (He 2021 ) showcased an innovative three-level optimization framework, setting new frontiers in cross-modal self-supervised pretraining and finetuning for pathology. This research introduced a three-level optimization framework for VQA on the PathVQA dataset, including self-supervised pretraining, VQA finetuning, and model validation stages.…”

Section: Language Models For Medical Imagingmentioning

confidence: 99%

Advancing medical imaging with language models: featuring a spotlight on ChatGPT

Hu,

Qian,

Pan

et al. 2024

Phys. Med. Biol.

View full text Add to dashboard Cite

This review paper aims to serve as a comprehensive guide and instructional resource for researchers seeking to effectively implement language models in medical imaging research. First, we presented the fundamental principles and evolution of language models, dedicating particular attention to large language models. We then reviewed the current literature on how language models are being used to improve medical imaging, emphasizing a range of applications such as image captioning, report generation, report classification, findings extraction, visual question response systems, interpretable diagnosis and so on. Notably, the capabilities of ChatGPT were spotlighted for researchers to explore its further applications. Furthermore, we covered the advantageous impacts of accurate and efficient language models in medical imaging analysis, such as the enhancement of clinical workflow efficiency, reduction of diagnostic errors, and assistance of clinicians in providing timely and accurate diagnoses. Overall, our goal is to have better integration of language models with medical imaging, thereby inspiring new ideas and innovations. It is our aspiration that this review can serve as a useful resource for researchers in this field, stimulating continued investigative and innovative pursuits of the application of language models in medical imaging.

show abstract

“…The model uses a learning-by-ignoring method to remove problematic training samples. In [ 48 ], an encoder–decoder architecture with a three-level optimization framework that relies on cross-modal self-supervised learning methods was developed to improve performance. Sharma et al [ 49 ] proposed a model based on ResNet and BERT models with attention modules to focus on the relevant part of the medical images and questions.…”

Section: Related Workmentioning

confidence: 99%

Vision–Language Model for Visual Question Answering in Medical Imagery

2023

View full text Add to dashboard Cite

In the clinical and healthcare domains, medical images play a critical role. A mature medical visual question answering system (VQA) can improve diagnosis by answering clinical questions presented with a medical image. Despite its enormous potential in the healthcare industry and services, this technology is still in its infancy and is far from practical use. This paper introduces an approach based on a transformer encoder–decoder architecture. Specifically, we extract image features using the vision transformer (ViT) model, and we embed the question using a textual encoder transformer. Then, we concatenate the resulting visual and textual representations and feed them into a multi-modal decoder for generating the answer in an autoregressive way. In the experiments, we validate the proposed model on two VQA datasets for radiology images termed VQA-RAD and PathVQA. The model shows promising results compared to existing solutions. It yields closed and open accuracies of 84.99% and 72.97%, respectively, for VQA-RAD, and 83.86% and 62.37%, respectively, for PathVQA. Other metrics such as the BLUE score showing the alignment between the predicted and true answer sentences are also reported.

show abstract

Towards Visual Question Answering on Pathology Images

Cited by 16 publications

References 32 publications

School of Mechanical Engineering, Shanghai Jiao Tong University, 200240, Shanghai, China

School of Mechanical Engineering, Shanghai Jiao Tong University, 200240, Shanghai, China

Advancing medical imaging with language models: featuring a spotlight on ChatGPT

Vision–Language Model for Visual Question Answering in Medical Imagery

Contact Info

Product

Resources

About