2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00473
|View full text |Cite
|
Sign up to set email alerts
|

Attention on Attention for Image Captioning

Abstract: Attention mechanisms are widely used in current encoder/decoder frameworks of image captioning, where a weighted average on encoded vectors is generated at each time step to guide the caption decoding process. However, the decoder has little idea of whether or how well the attended vector and the given attention query are related, which could make the decoder give misled results. In this paper, we propose an "Attention on Attention" (AoA) module, which extends the conventional attention mechanisms to determine… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
582
2
2

Year Published

2020
2020
2020
2020

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 737 publications
(589 citation statements)
references
References 39 publications
3
582
2
2
Order By: Relevance
“…The Up-Down [7] method proposed a combined bottom-up and top-down attention mechanism that enables attention to be calculated at the level of salient image regions. The AoANet [9] method introduced an extension of the attention operator in which the final attended information is weighted by a gate. Our work is developed on top of these methods.…”
Section: Quantitative Resultsmentioning
confidence: 99%
See 4 more Smart Citations
“…The Up-Down [7] method proposed a combined bottom-up and top-down attention mechanism that enables attention to be calculated at the level of salient image regions. The AoANet [9] method introduced an extension of the attention operator in which the final attended information is weighted by a gate. Our work is developed on top of these methods.…”
Section: Quantitative Resultsmentioning
confidence: 99%
“…AoANet model: This model is proposed in AoANet [9] where the results of self-attention and initial query are concatenated together and fed into two linear layers to obtain information vectors by multiplication with a sigmoid gate. The final result is used as an alternative to the original selfattention operation.…”
Section: Baseline Methodsmentioning
confidence: 99%
See 3 more Smart Citations