2019
DOI: 10.1109/access.2019.2923651
|View full text |Cite
|
Sign up to set email alerts
|

R-STAN: Residual Spatial-Temporal Attention Network for Action Recognition

Abstract: Two-stream network architecture has the ability to capture temporal and spatial features from videos simultaneously and has achieved excellent performance on video action recognition tasks. However, there is a fair amount of redundant information in both temporal and spatial dimensions in videos, which increases the complexity of network learning. To solve this problem, we propose residual spatial-temporal attention network (R-STAN), a feed-forward convolutional neural network using residual learning and spati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 38 publications
(15 citation statements)
references
References 39 publications
0
15
0
Order By: Relevance
“…We compared STRM using closed-set based methods and open-set based methods. The closed-set methods were: iDT [18], Two-stream [31], FstCN [71], MoFAP [72], MIFS [8], LTC [34], R-STAN [73], ST-Pyramid Network [74], ATW [75], DOVF [76], Four-Stream [77], TLE [78], and DTPP [79]. The open-set methods were: ODN [43], P-ODN [44], SDMM [48], and Mishra et al [47].…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 99%
“…We compared STRM using closed-set based methods and open-set based methods. The closed-set methods were: iDT [18], Two-stream [31], FstCN [71], MoFAP [72], MIFS [8], LTC [34], R-STAN [73], ST-Pyramid Network [74], ATW [75], DOVF [76], Four-Stream [77], TLE [78], and DTPP [79]. The open-set methods were: ODN [43], P-ODN [44], SDMM [48], and Mishra et al [47].…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 99%
“…In the image translation task, a channel attention network was designed by Sun et al [39], with which the original function in the encoder and the conversion function in the decoder can be better integrated. In addition, Liu et al [40] proposed a spatiotemporal attention module for video action recognition. Gao et al [41] introduced a residual attention mechanism to one convolutional layer object tracking network to avoid data imbalance.…”
Section: Attention Mechanismmentioning
confidence: 99%
“…Previously, 2D convolutional neural networks [27], [28] trained by ImageNet [29] were usually exploited for RGB image classification. However, for the task of video classification, appearance information is not enough, and dynamic features representation play a vital role in the process of recognition [9], [30]. To simulate motion information, K. Simonyan et al proposed a two-stream ConvNet architecture which incorporates spatial and temporal networks [8], where the temporal stream is trained to recognize actions from motion in the form of dense optical flow.…”
Section: Related Workmentioning
confidence: 99%