2021
DOI: 10.2298/csis200515038g
|View full text |Cite
|
Sign up to set email alerts
|

Double-layer affective visual question answering network

Abstract: Visual Question Answering (VQA) has attracted much attention recently in both natural language processing and computer vision communities, as it offers insight into the relationships between two relevant sources of information. Tremendous advances are seen in the field of VQA due to the success of deep learning. Based upon advances and improvements, the Affective Visual Question Answering Network (AVQAN) enriches the understanding and analysis of VQA models by making use of the emotional info… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
1
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 18 publications
0
1
0
Order By: Relevance
“…In contrast to single-modal tasks, multimodal tasks require extracting and understanding information from a single modality, which combining information from two different modalities for reasoning. Although this is challenging, current researchers have achieved many multimodal tasks, for instance, image-text matching [1], [2], image captioning [3], [4], and VQA [5], [6], [26]. As a typical representative of multimodal tasks, VQA requires understanding visual information and image information; what's more, com-bining the two to reason about the answer.…”
Section: Introductionmentioning
confidence: 99%
“…In contrast to single-modal tasks, multimodal tasks require extracting and understanding information from a single modality, which combining information from two different modalities for reasoning. Although this is challenging, current researchers have achieved many multimodal tasks, for instance, image-text matching [1], [2], image captioning [3], [4], and VQA [5], [6], [26]. As a typical representative of multimodal tasks, VQA requires understanding visual information and image information; what's more, com-bining the two to reason about the answer.…”
Section: Introductionmentioning
confidence: 99%
“…The result is usually represented by a grayscale image, and the grayscale value of each pixel in the image indicates the probability that the pixel belongs to a saliency object. Saliency object detection has become an important preprocessing step in many computer vision applications, including image and video compression [2], image relocation [3], video tracking [4] and robot navigation [5], etc.…”
Section: Introductionmentioning
confidence: 99%