2021
DOI: 10.1007/s11263-021-01436-0
|View full text |Cite
|
Sign up to set email alerts
|

Enhanced 3D Human Pose Estimation from Videos by Using Attention-Based Neural Network with Dilated Convolutions

Abstract: The attention mechanism provides a sequential prediction framework for learning spatial models with enhanced implicit temporal consistency. In this work, we show a systematic design (from 2D to 3D) for how conventional networks and other forms of constraints can be incorporated into the attention framework for learning long-range dependencies for the task of pose estimation. The contribution of this paper is to provide a systematic approach for designing and training of attention-based models for the end-to-en… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 20 publications
(6 citation statements)
references
References 49 publications
0
3
0
Order By: Relevance
“…Sequential networks are used in pose estimation to ‘lift’ an estimated 2D pose to 3D [ 30 , 31 ]. Recent research combines temporal information with lifting to improve accuracy during frames where one limb occludes another in the view of the camera (self-occlusion) or where not all key points were detected [ 15 , 32 , 33 ]. In contrast to CNNs, these sequential networks do not take a single frame as input but exploit temporal dependencies in the data for their prediction.…”
Section: Methodsmentioning
confidence: 99%
“…Sequential networks are used in pose estimation to ‘lift’ an estimated 2D pose to 3D [ 30 , 31 ]. Recent research combines temporal information with lifting to improve accuracy during frames where one limb occludes another in the view of the camera (self-occlusion) or where not all key points were detected [ 15 , 32 , 33 ]. In contrast to CNNs, these sequential networks do not take a single frame as input but exploit temporal dependencies in the data for their prediction.…”
Section: Methodsmentioning
confidence: 99%
“…The classification of body posture construction using the K-NN method, although more straightforward than the application of facial recognition, has accurate results [25]. Deep learning [22], multi-scale temporal features, spatio-temporal KCS pose differentiation, and occlusion data augmentation [29] have been used for the 2D to 3D development of human pose estimation [30,31]. Other methods use attention models [32] and multi-scale networks with phase inference optimization [33], introducing many parameters requiring manual tuning.…”
Section: Related Workmentioning
confidence: 99%
“…Other methods use attention models [32] and multi-scale networks with phase inference optimization [33], introducing many parameters requiring manual tuning. The performance of the graphical model-based approach has been surpassed by convolutional neural networks (CNNs) [31,34].…”
Section: Related Workmentioning
confidence: 99%
“…Human pose estimation Human pose estimation has attracted a lot of research interests in recent years (Yi, Zhou, and Xu 2021;Benzine et al 2021;Xu and Takano 2021;Gong, Zhang, and Feng 2021;Yuan et al 2021). In general, existing human pose estimation methods can be divided into two categories: bottom-up methods (Cao et al 2017;Kocabas, Karagoz, and Akbas 2018;Kreiss, Bertoni, and Alahi 2019;Li et al 2019a;Liu et al 2021a) and topdown methods (Fang et al 2017;Xiao, Wu, and Wei 2018;Wei et al 2016;Sun et al 2019;Moon, Chang, and Lee 2019;Benzine et al 2021). Human pose estimation works have been deployed into many applications such as digital human driven.…”
Section: Related Workmentioning
confidence: 99%