2020
DOI: 10.1609/aaai.v34i07.6927
|View full text |Cite
|
Sign up to set email alerts
|

SalSAC: A Video Saliency Prediction Model with Shuffled Attentions and Correlation-Based ConvLSTM

Abstract: The performance of predicting human fixations in videos has been much enhanced with the help of development of the convolutional neural networks (CNN). In this paper, we propose a novel end-to-end neural network “SalSAC” for video saliency prediction, which uses the CNN-LSTM-Attention as the basic architecture and utilizes the information from both static and dynamic aspects. To better represent the static information of each frame, we first extract multi-level features of same size from different layers of th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
31
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 41 publications
(32 citation statements)
references
References 24 publications
1
31
0
Order By: Relevance
“…STRA-Net [11] adopts a dual-pathway architecture combining 2D ResNet50 with ConvLSTM, while the proposed STCED utilizes a dualpathway 3D ResNet50 as the encoder, which implicitly justifies the capability of 3DCNN. As shown in Table 3, all of these four models [11], [13]- [15] perform far behind STCED on the DHF1K test set, which verifies the effectiveness of the proposed model. 1 https://mmcheng.net/videosal/…”
Section: E Comparison With the State-of-the-artsupporting
confidence: 53%
See 4 more Smart Citations
“…STRA-Net [11] adopts a dual-pathway architecture combining 2D ResNet50 with ConvLSTM, while the proposed STCED utilizes a dualpathway 3D ResNet50 as the encoder, which implicitly justifies the capability of 3DCNN. As shown in Table 3, all of these four models [11], [13]- [15] perform far behind STCED on the DHF1K test set, which verifies the effectiveness of the proposed model. 1 https://mmcheng.net/videosal/…”
Section: E Comparison With the State-of-the-artsupporting
confidence: 53%
“…Several models [7], [10], [13], [14] based on LSTM have only one data stream. In ACLNet [7], spatial features of each frame are extracted by a CNN with attention subnetwork.…”
Section: B Modern Dynamic Saliency Modelsmentioning
confidence: 99%
See 3 more Smart Citations