2018
DOI: 10.1007/s11042-018-6389-3
|View full text |Cite
|
Sign up to set email alerts
|

Word-to-region attention network for visual question answering

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 25 publications
(2 citation statements)
references
References 35 publications
0
2
0
Order By: Relevance
“…Yang et al [18] proposed using the types of problems to classify them and carry out a common attention mechanism. In 2019, Peng et al [19] used two general attention units, SA (self-attention) and GA (guide-attention) to form a modular common attention structure through the combination of SA and GA. In 2020, Guo et al [20] proposed a visual question answering method based on the reattention mechanism, which uses the answers to calculate the attention weight of images and defines an attention consistency loss function to measure the distance between the visual attention features learned through the questions and answers and inversely adjust the attention weight distribution of images.…”
Section: Attention Mechanismmentioning
confidence: 99%
“…Yang et al [18] proposed using the types of problems to classify them and carry out a common attention mechanism. In 2019, Peng et al [19] used two general attention units, SA (self-attention) and GA (guide-attention) to form a modular common attention structure through the combination of SA and GA. In 2020, Guo et al [20] proposed a visual question answering method based on the reattention mechanism, which uses the answers to calculate the attention weight of images and defines an attention consistency loss function to measure the distance between the visual attention features learned through the questions and answers and inversely adjust the attention weight distribution of images.…”
Section: Attention Mechanismmentioning
confidence: 99%
“…with more details and semantics, which is helpful to other visual understanding tasks, such as visual captioning (Bin et al 2017;Gao et al 2017), and visual question answering (Peng et al 2018;Gao et al 2018). Sadeghi and Farhadi (Sadeghi and Farhadi 2011) first define a triplet subject-predicate-object as a visual phrase, and train classifiers for every triplet phrase.…”
Section: Relationship Predictionmentioning
confidence: 99%