2020
DOI: 10.1109/access.2020.2968024
|View full text |Cite
|
Sign up to set email alerts
|

Learning Attention-Enhanced Spatiotemporal Representation for Action Recognition

Abstract: Learning spatiotemporal features via 3D-CNN (3D Convolutional Neural Network) models has been regarded as an effective approach for action recognition. In this paper, we explore visual attention mechanism for video analysis and propose a novel 3D-CNN model, dubbed AE-I3D (Attention-Enhanced Inflated-3D Network), for learning attention-enhanced spatiotemporal representation. The contribution of our AE-I3D is threefold: First, we inflate soft attention in spatiotemporal scope for 3D videos, and adopt softmax to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 50 publications
0
6
0
Order By: Relevance
“…For supervised feature extraction, CNNs are by far, the more reliable choice; however, the case presented in this study in an unsupervised case which is implied by the multiple unlabelled IMFs from the pressure signals. This presents an opportunity for the SAE to flourish since they are efficient for learning deep feature representations from multiple inputs [20]. The deep feature learning capabilities of SAEs have been recorded for many purposes including epileptic seizure detection [21], rotating machinery prognostics [6], anomaly detection [14], and a host of many other applications.…”
Section: Related Workmentioning
confidence: 99%
“…For supervised feature extraction, CNNs are by far, the more reliable choice; however, the case presented in this study in an unsupervised case which is implied by the multiple unlabelled IMFs from the pressure signals. This presents an opportunity for the SAE to flourish since they are efficient for learning deep feature representations from multiple inputs [20]. The deep feature learning capabilities of SAEs have been recorded for many purposes including epileptic seizure detection [21], rotating machinery prognostics [6], anomaly detection [14], and a host of many other applications.…”
Section: Related Workmentioning
confidence: 99%
“…TS+LST [1] UCF101/HMDB51 94.8%/70.2% AE-I3D [19] UCF101/HMDB51 95.9%/74.7% KF+SAMA [41] UCF101 95.9%…”
Section: Trajectorymentioning
confidence: 99%
“…[15], [17], [18] use the Attention Mechanism based multi-layered Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) model to improve the performance of their arithmetic. Shi, Z et al [19] proposed AE-I3D (Attention-Enhanced I3D) network for action recognition, the concept of AE-I3D is to enhance the spatiotemporal representation through inflate soft attention in spatiotemporal scope, and adopt softmax to generate the probability distribution of attentional features.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…As reported in [1], the test classification accuracy of ImageNets has been improved substantially compared with other methods at that time. Furthermore, besides classification tasks, attention mechanism has been also used in many other tasks such as object detection [3], [4], semantic segmentation [5], [6], super resolution [7], [8], action recognition [9], [10], etc. As the most popular attention method, SE technology used pooling operators to achieve the invariant feature of each channel, bringing nonlinearity at the same time.…”
Section: Introductionmentioning
confidence: 99%