Chongqing Chen scite author profile

Visual Question Answering (VQA) is a multimodal task involving Computer Vision (CV) and Natural Language Processing (NLP), the goal is to establish a high-efficiency VQA model. Learning a fine-grained and simultaneous understanding of both the visual content of images and the textual content of questions is the heart of VQA. In this paper, a novel Multimodal Encoder-Decoder Attention Networks (MEDAN) is proposed. The MEDAN consists of Multimodal Encoder-Decoder Attention (MEDA) layers cascaded in depth, and can capture rich and reasonable question features and image features by associating keywords in question with important object regions in image. Each MEDA layer contains an Encoder module modeling the self-attention of questions, as well as a Decoder module modeling the question-guided-attention and self-attention of images. Experimental evaluation results on the benchmark VQA-v2 dataset demonstrate that MEDAN achieves state-of-the-art VQA performance. With the Adam solver, our best single model delivers 71.01% overall accuracy on the test-std set, and with the AdamW solver, we achieve an overall accuracy of 70.76% on the test-dev set. Additionally, extensive ablation studies are conducted to explore the reasons for MEDAN's effectiveness.

show abstract

Local self-attention in transformer for visual question answering

Shen

Han

Guo

et al. 2022

Appl Intell

View full text Add to dashboard Cite

Manganese Prussian blue nanozymes with antioxidant capacity prevent acetaminophen-induced acute liver injury

Chen

et al. 2023

Biomater. Sci.

View full text Add to dashboard Cite

show abstract

Hepatic NCoR1 deletion exacerbates alcohol-induced liver injury in mice by promoting CCL2-mediated monocyte-derived macrophage infiltration

Yin

Wei

et al. 2022

Acta Pharmacol Sin

View full text Add to dashboard Cite

CAAN: Context-Aware attention network for visual question answering

Chen

Han²,

Chang³

2022

Pattern Recognition

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Chongqing Chen

Multimodal Encoder-Decoder Attention Networks for Visual Question Answering

Local self-attention in transformer for visual question answering

Manganese Prussian blue nanozymes with antioxidant capacity prevent acetaminophen-induced acute liver injury

Hepatic NCoR1 deletion exacerbates alcohol-induced liver injury in mice by promoting CCL2-mediated monocyte-derived macrophage infiltration

CAAN: Context-Aware attention network for visual question answering

Contact Info

Product

Resources

About