2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015
DOI: 10.1109/cvpr.2015.7299005
|View full text |Cite
|
Sign up to set email alerts
|

Efficient ConvNet-based marker-less motion capture in general scenes with a low number of cameras

Abstract: We present a novel method for accurate marker-less capture of articulated skeleton motion of several subjects in general scenes, indoors and outdoors, even from input filmed with as few as two cameras. Our approach unites a discriminative image-based joint detection method with a model-based generative motion tracking algorithm through a combined pose optimization energy. The discriminative part-based pose detection method, implemented using Convolutional Networks (ConvNet), estimates unary potentials for each… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
99
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 134 publications
(102 citation statements)
references
References 41 publications
(72 reference statements)
1
99
0
Order By: Relevance
“…Multi-view 3D human pose: Markerless motion capture has been investigated in computer vision for a decade. Early works on this problem aim to track the 3D skeleton or geometric model of human body through a multi-view sequence [38,43,11]. These tracking-based methods require initialization in the first frame and are prone to local optima and tracking failures.…”
Section: Related Workmentioning
confidence: 99%
“…Multi-view 3D human pose: Markerless motion capture has been investigated in computer vision for a decade. Early works on this problem aim to track the 3D skeleton or geometric model of human body through a multi-view sequence [38,43,11]. These tracking-based methods require initialization in the first frame and are prone to local optima and tracking failures.…”
Section: Related Workmentioning
confidence: 99%
“…For example, the most widely used system [2] needs multiple calibrated cameras with reflective markers carefully attached to the subjects' body. The actively-studied markerless approaches are also based on multi-view systems [18,26,16,22,23] or depth cameras [46,7]. For this reason, the amount of available 3D motion data is extremely limited.…”
Section: Introductionmentioning
confidence: 99%
“…Although there exists a commercial solution that uses marker-less multi-camera systems to obtain highly precise skeleton data at 120 frames per second (FPS) and approximately 25-50ms latency [99], computing depth maps is usually slow and often suffers from problems such as failures of correspondence search and noisy depth information. To address these problems, algorithms were also studied to construct human skeleton models directly from the multi-images without calculating the depth image [80,81,82]. For example, Gall et al [81] introduced an approach to fully-automatically estimate the 3D skeleton model from a multi-perspective video sequence, where an articulated template model and silhouettes are obtained from the sequence.…”
Section: Construction From Rgb Imagerymentioning
confidence: 99%