2020
DOI: 10.1609/aaai.v34i07.6718
|View full text |Cite
|
Sign up to set email alerts
|

Pyramid Constrained Self-Attention Network for Fast Video Salient Object Detection

Abstract: Spatiotemporal information is essential for video salient object detection (VSOD) due to the highly attractive object motion for human's attention. Previous VSOD methods usually use Long Short-Term Memory (LSTM) or 3D ConvNet (C3D), which can only encode motion information through step-by-step propagation in the temporal domain. Recently, the non-local mechanism is proposed to capture long-range dependencies directly. However, it is not straightforward to apply the non-local mechanism into VSOD, because i) it … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
70
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 126 publications
(76 citation statements)
references
References 42 publications
0
70
0
Order By: Relevance
“…Visual comparisons of the proposed method and the state-of-the-art algorithms. From left to right: the input image, ground truth, the saliency maps produced by our proposed method, GT, SCOM [48], SCNN [50], DLVSD [33], FGRN [14], MBNM [51], MST [49], PSCA [52], PDB [11] ,LSTI [53] , RCR [12], SSAV [39]. Our method consistently produces saliency maps closest to the ground truth.…”
Section: ) Visual Comparisonmentioning
confidence: 91%
See 1 more Smart Citation
“…Visual comparisons of the proposed method and the state-of-the-art algorithms. From left to right: the input image, ground truth, the saliency maps produced by our proposed method, GT, SCOM [48], SCNN [50], DLVSD [33], FGRN [14], MBNM [51], MST [49], PSCA [52], PDB [11] ,LSTI [53] , RCR [12], SSAV [39]. Our method consistently produces saliency maps closest to the ground truth.…”
Section: ) Visual Comparisonmentioning
confidence: 91%
“…We compare our video saliency detection network with other 14 state-of-the-art models, including MDB [43], MST [49], STBP [32], SFLR [30], SCOM [48], SCNN [50], DLVS [33], FGRN [14], MBNM [51], PDBM [11], RCRNet [12], S-SAV [39], PSCA [52], and LSTI [53]. For fair comparison, we take the code provided by Fan et al [39] to compute these metrics on our video saliency maps.…”
Section: Comparison With the State-of-the-art Methods 1) Quantitatmentioning
confidence: 99%
“…In addition, CBAM [55] considers capturing feature information from spatial and channel attention simultaneously, which significantly improves the feature representation ability. Recently, the nonlocal neural network [56] has been widely used in salient object detection [58], image superresolution [59], etc. Its main purpose is to enhance the features of the current position by aggregating contextual information from other positions and solve the problem that the receptive field of a single convolutional layer is ineffective to cover correlated regions.…”
Section: Attention In Cnnsmentioning
confidence: 99%
“…Among modern Convolutional Neural Networks (ConvNet/CNNs), there are many techniques, e.g., dynamic heads with attentions [1], dual attention [2], self-attention [3] have gained increasing attention due to their capability. Still, all suffer from accuracy performance issues.…”
Section: Introductionmentioning
confidence: 99%