Two-Stream Temporal Convolutional Networks for Skeleton-Based Human Action Recognition

Jia, Jin-Gong; Zhou, Yuanfeng; Hao, Xingwei; Li, Feng; Desrosiers, Christian; Zhang, Caiming

doi:10.1007/s11390-020-0405-6

Cited by 22 publications

(23 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To collect 3D skeleton data, RGB deep images are captured by the Microsoft Kinect sensor. This method is one of the most popular to estimate 3D human pose [ 5 , 16 , 18 ]. The method converts 2D image detections from multiple camera views into 3D images [ 28 , 29 , 30 ].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Student Behavior Recognition System for the Classroom Environment Based on Skeleton Pose Estimation and Person Detection

Lin

Ngo

Dow

et al. 2021

Sensors

View full text Add to dashboard Cite

Human action recognition has attracted considerable research attention in the field of computer vision, especially for classroom environments. However, most relevant studies have focused on one specific behavior of students. Therefore, this paper proposes a student behavior recognition system based on skeleton pose estimation and person detection. First, consecutive frames captured with a classroom camera were used as the input images of the proposed system. Then, skeleton data were collected using the OpenPose framework. An error correction scheme was proposed based on the pose estimation and person detection techniques to decrease incorrect connections in the skeleton data. The preprocessed skeleton data were subsequently used to eliminate several joints that had a weak effect on behavior classification. Second, feature extraction was performed to generate feature vectors that represent human postures. The adopted features included normalized joint locations, joint distances, and bone angles. Finally, behavior classification was conducted to recognize student behaviors. A deep neural network was constructed to classify actions, and the proposed system was able to identify the number of students in a classroom. Moreover, a system prototype was implemented to verify the feasibility of the proposed system. The experimental results indicated that the proposed scheme outperformed the skeleton-based scheme in complex situations. The proposed system had a 15.15% higher average precision and 12.15% higher average recall than the skeleton-based scheme did.

show abstract

Section: Related Workmentioning

confidence: 99%

“…The skeleton data representation of human poses in videos is a popular technique for action recognition [ 5 , 15 , 16 , 17 , 18 , 19 ]. In this technique, the main task is to identify the skeleton data, including the detailed location of joints.…”

Section: Introductionmentioning

confidence: 99%

Student Behavior Recognition System for the Classroom Environment Based on Skeleton Pose Estimation and Person Detection

Lin

Ngo

Dow

et al. 2021

Sensors

View full text Add to dashboard Cite

show abstract

“…Jia et al [ 9 ] introduced a new variant of feature representation for the skeleton-based human action recognition problem. They divided the usual vector representation for a human skeleton into five relevant joint subgroups, namely the left arm, right arm, left leg, right leg and trunk and then those parts were linked together into a whole body with the head.…”

Section: Related Workmentioning

confidence: 99%

“…Action—actions are single-person activities that may be composed of multiple gestures organized temporally. Most datasets [ 2 , 3 , 4 , 5 , 6 ] and most proposed solutions [ 7 , 8 , 9 , 10 , 11 ] are focused on this category.…”

Section: Introductionmentioning

confidence: 99%

Comparison between Recurrent Networks and Temporal Convolutional Networks Approaches for Skeleton-Based Action Recognition

Nan

Trăşcău

Florea

et al. 2021

Sensors

View full text Add to dashboard Cite

Action recognition plays an important role in various applications such as video monitoring, automatic video indexing, crowd analysis, human-machine interaction, smart homes and personal assistive robotics. In this paper, we propose improvements to some methods for human action recognition from videos that work with data represented in the form of skeleton poses. These methods are based on the most widely used techniques for this problem—Graph Convolutional Networks (GCNs), Temporal Convolutional Networks (TCNs) and Recurrent Neural Networks (RNNs). Initially, the paper explores and compares different ways to extract the most relevant spatial and temporal characteristics for a sequence of frames describing an action. Based on this comparative analysis, we show how a TCN type unit can be extended to work even on the characteristics extracted from the spatial domain. To validate our approach, we test it against a benchmark often used for human action recognition problems and we show that our solution obtains comparable results to the state-of-the-art, but with a significant increase in the inference speed.

show abstract

“…Even though skeleton pose estimation is a structured data type, several methods approached the problem with 2D ConvNets [74][75][76][77]. Li et al [77] proposed a two-stream 2D ConvNet: one to extract features from spatial coordinates of the pose in a 3D manner (position, joints and frames) through a skeleton transformer module, which extracts weighted interpolated joints matrix.…”

Section: Global Representationsmentioning

confidence: 99%

Human Behavior Analysis: A Survey on Action Recognition

Degardin

Proença

2021

Applied Sciences

View full text Add to dashboard Cite

The visual recognition and understanding of human actions remain an active research domain of computer vision, being the scope of various research works over the last two decades. The problem is challenging due to its many interpersonal variations in appearance and motion dynamics between humans, without forgetting the environmental heterogeneity between different video images. This complexity splits the problem into two major categories: action classification, recognising the action being performed in the scene, and spatiotemporal action localisation, concerning recognising multiple localised human actions present in the scene. Previous surveys mainly focus on the evolution of this field, from handcrafted features to deep learning architectures. However, this survey presents an overview of both categories and respective evolution within each one, the guidelines that should be followed and the current benchmarks employed for performance comparison between the state-of-the-art methods.

show abstract

Two-Stream Temporal Convolutional Networks for Skeleton-Based Human Action Recognition

Cited by 22 publications

References 38 publications

Student Behavior Recognition System for the Classroom Environment Based on Skeleton Pose Estimation and Person Detection

Student Behavior Recognition System for the Classroom Environment Based on Skeleton Pose Estimation and Person Detection

Comparison between Recurrent Networks and Temporal Convolutional Networks Approaches for Skeleton-Based Action Recognition

Human Behavior Analysis: A Survey on Action Recognition

Contact Info

Product

Resources

About