Semantic Tree-Structured Representation for Visual Question Answering System

Rekha, K.; Chitrakala, S.

doi:10.1007/978-981-16-5348-3_29

Proceedings of International Conference on Data Science and Applications

2021

DOI: 10.1007/978-981-16-5348-3_29

|View full text |Cite

Semantic Tree-Structured Representation for Visual Question Answering System

K. Rekha

S. Chitrakala

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2023

Publication Types

Select...

Article1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Knowledge-Based Visual Question Answering Using Multi-Modal Semantic Graph

Lei

Meng

2023

Electronics

View full text Add to dashboard Cite

The field of visual question answering (VQA) has seen a growing trend of integrating external knowledge sources to improve performance. However, owing to the potential incompleteness of external knowledge sources and the inherent mismatch between different forms of data, current knowledge-based visual question answering (KBVQA) techniques are still confronted with the challenge of effectively integrating and utilizing multiple heterogeneous data. To address this issue, a novel approach centered on a multi-modal semantic graph (MSG) is proposed. The MSG serves as a mechanism for effectively unifying the representation of heterogeneous data and diverse types of knowledge. Additionally, a multi-modal semantic graph knowledge reasoning model (MSG-KRM) is introduced to perform reasoning and deep fusion of image–text information and external knowledge sources. The development of the semantic graph involves extracting keywords from the image object detection information, question text, and external knowledge texts, which are then represented as symbol nodes. Three types of semantic graphs are then constructed based on the knowledge graph, including vision, question, and the external knowledge text, with non-symbol nodes added to connect these three independent graphs and marked with respective node and edge types. During the inference stage, the multi-modal semantic graph and image–text information are embedded into the feature semantic graph through three embedding methods, and a type-aware graph attention module is employed for deep reasoning. The final answer prediction is a blend of the output from the pre-trained model, graph pooling results, and the characteristics of non-symbolic nodes. The experimental results on the OK-VQA dataset show that the MSG-KRM model is superior to existing methods in terms of overall accuracy score, achieving a score of 43.58, and with improved accuracy for most subclass questions, proving the effectiveness of the proposed method.

show abstract

Knowledge-Based Visual Question Answering Using Multi-Modal Semantic Graph

Lei

Meng

2023

Electronics

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Semantic Tree-Structured Representation for Visual Question Answering System

Cited by 1 publication

References 8 publications

Knowledge-Based Visual Question Answering Using Multi-Modal Semantic Graph

Knowledge-Based Visual Question Answering Using Multi-Modal Semantic Graph

Contact Info

Product

Resources

About