Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

Wang, Limin; Xiong, Yuanjun; Wang, Zhe; Qiao, Yu; Lin, Dahua; Tang, Xiaoou; Gool, Luc Van

doi:10.1007/978-3-319-46484-8_2

Cited by 2,937 publications

(3,368 citation statements)

References 33 publications

Supporting

Mentioning

3,176

Contrasting

Unclassified

Order By: Relevance

“…More precisely, we used the software tool (https:// github.com/yjxiong/dense flow/tree/opencv-3.1) provided by Wang et al in [36] to compute the optical flow images (see Figure 2 for an example of its output). We kept the original optical flow computation parameters of Wang et al to replicate their results in action recognition.…”

Section: The Optical Flow Imagesmentioning

confidence: 99%

Vision-Based Fall Detection with Convolutional Neural Networks

Núñez-Marcos

Azkune

Arganda‐Carreras

2017

Wireless Communications and Mobile Computing

214

125

View full text Add to dashboard Cite

One of the biggest challenges in modern societies is the improvement of healthy aging and the support to older persons in their daily activities. In particular, given its social and economic impact, the automatic detection of falls has attracted considerable attention in the computer vision and pattern recognition communities. Although the approaches based on wearable sensors have provided high detection rates, some of the potential users are reluctant to wear them and thus their use is not yet normalized. As a consequence, alternative approaches such as vision-based methods have emerged. We firmly believe that the irruption of the Smart Environments and the Internet of Things paradigms, together with the increasing number of cameras in our daily environment, forms an optimal context for vision-based systems. Consequently, here we propose a vision-based solution using Convolutional Neural Networks to decide if a sequence of frames contains a person falling. To model the video motion and make the system scenario independent, we use optical flow images as input to the networks followed by a novel three-step training phase. Furthermore, our method is evaluated in three public datasets achieving the state-of-the-art results in all three of them.

show abstract

Section: The Optical Flow Imagesmentioning

confidence: 99%

Vision-Based Fall Detection with Convolutional Neural Networks

Núñez-Marcos

Azkune

Arganda‐Carreras

2017

Wireless Communications and Mobile Computing

214

125

View full text Add to dashboard Cite

show abstract

“…For HMDB51, we use the model pre-trained on UCF101, then follow the same process as UCF101, the training ends at 10K and 9K in the motion-segment pre-train and uniform-segment fine-tune stages. We employ scale-jittering [10] in four spatial scales {240, 224, 192, 168}. For joint training, we set the weight of {video, sub-video} and {sub-video, sub-video} networks to 0.7 and 0.3.…”

Section: Implementation Detailsmentioning

confidence: 99%

Self-Supervised Learning of Video Representation for Anticipating Actions in Early Stage

Liu

Tang

et al. 2018

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYIn this paper, we propose a novel self-supervised learning of video representation which is capable to anticipate the video category by only reading its short clip. The key idea is that we employ the Siamese convolutional network to model the self-supervised feature learning as two different image matching problems. By using frame encoding, the proposed video representation could be extracted from different temporal scales. We refine the training process via a motion-based temporal segmentation strategy. The learned representations for videos can be not only applied to action anticipation, but also to action recognition. We verify the effectiveness of the proposed approach on both action anticipation and action recognition using two datasets namely UCF101 and HMDB51. The experiments show that we can achieve comparable results with the state-of-the-art selfsupervised learning methods on both tasks. key words: action anticipation, video frame encoding, convolutional neural network

show abstract

“…One is the two-stream convolutional network [8], the structure of which is BN-Inception [9] that initialized by model pre-trained on Kinetics dataset. The other is the multi-layer recurrent network for skeletal data processing.…”

Section: Tsn2 Modelmentioning

confidence: 99%

“…The basic structure of TSN model proposed in [8] is twostream convolutional neural networks. Two-stream networks [1] includes two convolutional networks: spatial network and temporal network, combining spatial and temporal information.…”

Section: A Two-stream Convolutional Neural Networkmentioning

confidence: 99%

See 1 more Smart Citation

Research of Action Recognition Methods Based on RGB+D Videos

Huang¹,

Chen²

2018

Proceedings of the 2018 2nd International Conference on Artificial Intelligence: Technologies and Applications (ICAITA 2018)

View full text Add to dashboard Cite

Abstract-In order to solve the problem on making full use of RGB+D dataset that includes RGB data, 3D skeletal data, depth map sequences and infrared videos, this paper proposes an action recognition method of RGB+D videos that merges a multilayer recurrent neural network and two-stream convolutional networks, combining RGB information and joints information together. Simulation results show that the multi-layer recurrent network proposed in this paper has better performance than other recurrent networks when dealing with the skeletal data. Moreover, by combining it with the spatial network or temporal network through nonlinear weighted score fusion, the recognition accuracy is further improved. The cross-view action recognition accuracy is improved to be 0.79%, 5.6%, 20.62% and 23.65% higher than the original method, respectively by using the multilayer network alone, combining the multi-layer network and spatial network, combining the multi-layer network and temporal network, and combining three networks together.

show abstract

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

Cited by 2,937 publications

References 33 publications

Vision-Based Fall Detection with Convolutional Neural Networks

Vision-Based Fall Detection with Convolutional Neural Networks

Self-Supervised Learning of Video Representation for Anticipating Actions in Early Stage

Research of Action Recognition Methods Based on RGB+D Videos

Contact Info

Product

Resources

About