2015
DOI: 10.1007/978-3-319-16808-1_21
|View full text |Cite
|
Sign up to set email alerts
|

MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation

Abstract: In this work, we propose a novel and efficient method for articulated human pose estimation in videos using a convolutional network architecture, which incorporates both color and motion features. We propose a new human body pose dataset, FLIC-motion 1 , that extends the FLIC dataset [1] with additional motion features. We apply our architecture to this dataset and report significantly better performance than current state-of-the-art pose detection systems.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
98
0

Year Published

2016
2016
2019
2019

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 114 publications
(101 citation statements)
references
References 60 publications
(84 reference statements)
0
98
0
Order By: Relevance
“…[28][29][30] Also, there is the more challenging task of simultaneous annotation of multiple people [17,31]. In addition, there is work like that of Oliveira et al [32] that performs human part segmentation based on fully convolutional networks [23].…”
Section: Related Workmentioning
confidence: 99%
“…[28][29][30] Also, there is the more challenging task of simultaneous annotation of multiple people [17,31]. In addition, there is work like that of Oliveira et al [32] that performs human part segmentation based on fully convolutional networks [23].…”
Section: Related Workmentioning
confidence: 99%
“…Single person pose estimation in videos has also been studied extensively in the literature [28,9,46,33,46,20,44,29,13,18]. These approaches mainly aim to improve pose estimation by utilizing temporal smoothing constraints [28,9,44,33,13] and/or optical flow information [46,20,29], but they are not directly applicable to videos with multiple potentially occluding persons.…”
Section: Related Workmentioning
confidence: 99%
“…These approaches mainly aim to improve pose estimation by utilizing temporal smoothing constraints [28,9,44,33,13] and/or optical flow information [46,20,29], but they are not directly applicable to videos with multiple potentially occluding persons.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Convolutional neural networks (CNN) [1,2] have recently demonstrated superior performance on many tasks such as image classification [3,4,5], object detection [6,7,8,9,10,11], object tracking [12,13,14], text detection [15,16], text recognition [17,18,19], local feature description [20], video classification [21,22,23], human pose estimation [24,25,26], scene recognition [27,28] and scene labelling [29,30].…”
Section: Introductionmentioning
confidence: 99%