2020
DOI: 10.1109/tpami.2020.2976014
|View full text |Cite
|
Sign up to set email alerts
|

Multi-task Deep Learning for Real-Time 3D Human Pose Estimation and Action Recognition

Abstract: Human pose estimation and action recognition are related tasks since both problems are strongly dependent on the human body representation and analysis. Nonetheless, most recent methods in the literature handle the two problems separately. In this work, we propose a multi-task framework for jointly estimating 2D or 3D human poses from monocular color images and classifying human actions from video sequences. We show that a single architecture can be used to solve both problems in an efficient way and still ach… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
43
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 88 publications
(50 citation statements)
references
References 68 publications
0
43
0
Order By: Relevance
“…3. Inspired by multi-task deep learning [30], we first train the mask generator only with L BCE for 100 epochs. Then we freeze all parameters of the object detector and train the remaining network with L T otal for around 200 epochs.…”
Section: E Supervision Strategymentioning
confidence: 99%
“…3. Inspired by multi-task deep learning [30], we first train the mask generator only with L BCE for 100 epochs. Then we freeze all parameters of the object detector and train the remaining network with L T otal for around 200 epochs.…”
Section: E Supervision Strategymentioning
confidence: 99%
“…In the work of Luvizon et al [55], they propose a multi-task framework for jointly estimating 2D or 3D human poses from monocular images. The architecture is composed of prediction blocks, downscaling and upscaling units, and simple connections.…”
Section: Fully-supervisedmentioning
confidence: 99%
“…Mobile edge computing in video but used frame by frame as image, sensors of multifunctional simulated by smart phone to gather information from human body to recognize the activities in medical issue [29]. Monocular images extracted from video in 3D and 2D poses to verified from which activity was presented in [30]; the authors used two high parameters in still and video images. Random forest model was presented to enhance the deep learning, and 40 activities were recognized with good performance of HAR system [31].…”
Section: Literature Reviewmentioning
confidence: 99%