2016
DOI: 10.1007/978-3-319-46484-8_2
|View full text |Cite
|
Sign up to set email alerts
|

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

Abstract: Abstract. Deep convolutional networks have achieved great success for visual recognition in still images. However, for action recognition in videos, the advantage over traditional methods is not so evident. This paper aims to discover the principles to design effective ConvNet architectures for action recognition in videos and learn these models given limited training samples. Our first contribution is temporal segment network (TSN), a novel framework for video-based action recognition. which is based on the i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

19
3,176
2
8

Year Published

2017
2017
2023
2023

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 2,937 publications
(3,368 citation statements)
references
References 33 publications
19
3,176
2
8
Order By: Relevance
“…More precisely, we used the software tool (https:// github.com/yjxiong/dense flow/tree/opencv-3.1) provided by Wang et al in [36] to compute the optical flow images (see Figure 2 for an example of its output). We kept the original optical flow computation parameters of Wang et al to replicate their results in action recognition.…”
Section: The Optical Flow Imagesmentioning
confidence: 99%
“…More precisely, we used the software tool (https:// github.com/yjxiong/dense flow/tree/opencv-3.1) provided by Wang et al in [36] to compute the optical flow images (see Figure 2 for an example of its output). We kept the original optical flow computation parameters of Wang et al to replicate their results in action recognition.…”
Section: The Optical Flow Imagesmentioning
confidence: 99%
“…For HMDB51, we use the model pre-trained on UCF101, then follow the same process as UCF101, the training ends at 10K and 9K in the motion-segment pre-train and uniform-segment fine-tune stages. We employ scale-jittering [10] in four spatial scales {240, 224, 192, 168}. For joint training, we set the weight of {video, sub-video} and {sub-video, sub-video} networks to 0.7 and 0.3.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…One is the two-stream convolutional network [8], the structure of which is BN-Inception [9] that initialized by model pre-trained on Kinetics dataset. The other is the multi-layer recurrent network for skeletal data processing.…”
Section: Tsn2 Modelmentioning
confidence: 99%
“…The basic structure of TSN model proposed in [8] is twostream convolutional neural networks. Two-stream networks [1] includes two convolutional networks: spatial network and temporal network, combining spatial and temporal information.…”
Section: A Two-stream Convolutional Neural Networkmentioning
confidence: 99%
See 1 more Smart Citation