2022
DOI: 10.48550/arxiv.2202.09741
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Visual Attention Network

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
83
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 63 publications
(116 citation statements)
references
References 0 publications
3
83
0
Order By: Relevance
“…[ [113][114][115][116] Channel & spatial attention Predict channel and spatial attention masks separately (e.g., [6,117]) or generate a joint 3-D channel, height, width attention mask directly (e.g., [118][119][120]) and use it to select important features. [6,10,13,14,50,101,[117][118][119][121][122][123][124][125][126][127][128][129][130] Spatial & temporal attention…”
Section: Introductionmentioning
confidence: 99%
“…[ [113][114][115][116] Channel & spatial attention Predict channel and spatial attention masks separately (e.g., [6,117]) or generate a joint 3-D channel, height, width attention mask directly (e.g., [118][119][120]) and use it to select important features. [6,10,13,14,50,101,[117][118][119][121][122][123][124][125][126][127][128][129][130] Spatial & temporal attention…”
Section: Introductionmentioning
confidence: 99%
“…Concurrent works. We notice three concurrent works, including ConvNeXT [42], RepLKNet [14] and Visual Attention Networks (VAN) [20]. All these works are motivated by large receptive field and exploit convolutions with large or dilated kernels as the main building block.…”
Section: Convolutionsmentioning
confidence: 99%
“…After that, more attention mechanisms [38]- [41] were proposed, such as self-attention [42] and channel attention [38]. Nowadays, attention mechanisms have been applied in many visual tasks [37]- [41], [43], [44]. As for self-supervised monocular depth estimation, attention based network has been applied in [19]- [21].…”
Section: Attention Mechanismmentioning
confidence: 99%
“…For the encoder, we employ a visual attention network (VAN) [44] to extract multi-scale feature maps X e i , i = 1, 2, 3, 4. A VAN has four stages, where spatial adaptability and channel adaptability are efficiently implemented by the large kernel attention.…”
Section: B Vadepth Network Architecturementioning
confidence: 99%