2022
DOI: 10.1109/tcsvt.2022.3169842
|View full text |Cite
|
Sign up to set email alerts
|

Attention in Attention: Modeling Context Correlation for Efficient Video Classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 32 publications
(7 citation statements)
references
References 49 publications
0
3
0
Order By: Relevance
“…Earlier approaches for video retrieval mainly revolve around code books [2,20,22] and hashing functions [32,33] for encoding a video into a low-dimensional representation. Fueled by the success of deep learning [6,10,25,27,28,41] in recent years, the predominant approaches are to decompose the video into frames and feed them into an image extraction backbone network, obtaining a sequence of image feature representations. One approach is to fuse all these image features into a single video-level representation and perform similar video pair detection on video-level representations [21,23,24].…”
Section: Video Retrievalmentioning
confidence: 99%
“…Earlier approaches for video retrieval mainly revolve around code books [2,20,22] and hashing functions [32,33] for encoding a video into a low-dimensional representation. Fueled by the success of deep learning [6,10,25,27,28,41] in recent years, the predominant approaches are to decompose the video into frames and feed them into an image extraction backbone network, obtaining a sequence of image feature representations. One approach is to fuse all these image features into a single video-level representation and perform similar video pair detection on video-level representations [21,23,24].…”
Section: Video Retrievalmentioning
confidence: 99%
“…This method uses a global context pooling mechanism to enhance the spatially informative channels and was verified to be effective in image understanding tasks. A recent work by Hao et al [ 26 ] studied the insertion of channel context into the spatio-temporal attention learning block for element-wise feature refinement.…”
Section: Related Workmentioning
confidence: 99%
“…The Stand-alone Inter-Frame Attention [62] is an attention mechanism that operates across multiple frames, computing local self-attention for every spatial position. Hao et al [63] proposes an effective attention-in-attention technique for enhancing element-wise features, exploring the possibility of integrating channel context into the spatio-temporal attention learning module. Visual attention network [64] uses a large kernel attention to support the establishment of self-adaptive and extended-range correlations of self-attention.…”
Section: Attention Mechanismmentioning
confidence: 99%