2021
DOI: 10.3390/s21062051
|View full text |Cite
|
Sign up to set email alerts
|

Comparison between Recurrent Networks and Temporal Convolutional Networks Approaches for Skeleton-Based Action Recognition

Abstract: Action recognition plays an important role in various applications such as video monitoring, automatic video indexing, crowd analysis, human-machine interaction, smart homes and personal assistive robotics. In this paper, we propose improvements to some methods for human action recognition from videos that work with data represented in the form of skeleton poses. These methods are based on the most widely used techniques for this problem—Graph Convolutional Networks (GCNs), Temporal Convolutional Networks (TCN… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 28 publications
(23 citation statements)
references
References 48 publications
(61 reference statements)
0
17
0
Order By: Relevance
“…Noise usually accompanies images during acquisition or transmission, resulting in contrast reduction, color shift, and poor visual quality. The interference of noise not only contaminates the naturalness of an image, but also damages the precision of various computer vision-based applications, such as semantic segmentation [ 1 , 2 ], motion tracking [ 3 , 4 ], action recognition [ 5 , 6 ], and object detection [ 7 , 8 , 9 , 10 , 11 , 12 ], to name a few. Consequently, noise removal for these applications has attracted great interest as a preprocessing task over the last two decades.…”
Section: Introductionmentioning
confidence: 99%
“…Noise usually accompanies images during acquisition or transmission, resulting in contrast reduction, color shift, and poor visual quality. The interference of noise not only contaminates the naturalness of an image, but also damages the precision of various computer vision-based applications, such as semantic segmentation [ 1 , 2 ], motion tracking [ 3 , 4 ], action recognition [ 5 , 6 ], and object detection [ 7 , 8 , 9 , 10 , 11 , 12 ], to name a few. Consequently, noise removal for these applications has attracted great interest as a preprocessing task over the last two decades.…”
Section: Introductionmentioning
confidence: 99%
“…A spatial-temporal two-stream transformer network [ 32 ] is proposed to model dependencies between joints using the Transformer self-attention operator. Additionally, some work [ 34 ] has been done to explore and compare different ways of extracting human pose features, and to extend a TCN-like unit to extract the most relevant spatial and temporal characteristics for a sequence of frames.…”
Section: Related Workmentioning
confidence: 99%
“…TCN accounts for the caveats of sequence models, compare RNN e.g., LSTM or Gated-Recurrent-Unit Network (GRU) when learning very long sequences [36]. Advantages are the mitigation of the vanishing/exploding gradient problem when back-propagating through time as often encountered with LSTM; reduction of memory usage, training and inference time over traditional RNN architectures [37]; compared to LSTM, TCN also requires less trainable parameters to store intermediate results [35]. To elaborate, 1D-convolution adopted in TCN shares the learned filters across the entire input feature map of length l per input channel c. This can be attributed to the parallelism of the convolution operation.…”
Section: A Principles Of Temporal Convolutional Networkmentioning
confidence: 99%
“…TCN, initially presented by [35] addresses above shortcomings. TCN performs dilated, causal convolution -transforming CNN to highly efficient, auto-regressive models as evidenced by [35]- [37]. Unlike e.g., LSTM, TCN is able to be trained on input sequences, irrespective of the length as the number of trainable parameters per layer only depends on the number of input features, filters and the kernel-size.…”
Section: Introductionmentioning
confidence: 99%