SPEC: Seeing People in the Wild with an Estimated Camera

Kocabas, Muhammed; Huang, Chun-Hao P.; Tesch, Joachim; Müller, Lea; Hilliges, Otmar; Black, Michael J.

doi:10.1109/iccv48922.2021.01085

Cited by 78 publications

(44 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…2, our framework consists of four stages. In Stage I, we first use multi-object tracking (MOT) and re-identification algorithms to obtain the bounding box sequence of each person, which is input to a human mesh recovery method (e.g., KAMA [33] or SPEC [47]) to extract the motion Q…”

Section: Methodsmentioning

confidence: 99%

“…Several methods directly predict the absolute depth of each person using a heatmap representation [16,115]. Recently, SPEC [47] learns to predict the camera parameters (pitch, yaw, FoV) from the image, which are used for absolute pose regression in the camera coordinates. THUNDR [108] also adopts a similar strategy but uses known camera parameters.…”

Section: Related Workmentioning

confidence: 99%

“…Baselines. Since no prior methods can estimate global motions from dynamic cameras and address long-term occlusions, we design various baselines by combining stateof-the-art human mesh recovery methods (KAMA [33] or SPEC [47]), motion infilling methods, and SLAM-based camera estimation (OpenSfM [68]). In particular, we use the estimated camera parameters to convert estimated motions from the camera coordinates to the global coordinates.…”

Section: Evaluation Of Glamrmentioning

confidence: 99%

“…Initial Human Pose and Shape Estimation. As mentioned in the main paper, we use KAMA [33] or SPEC [47] to provide the initial human pose and shape estimation from the bounding boxes extracted by 3D MOT. We choose these two methods since both KAMA and SPEC estimate 3D human poses in the camera coordinates with absolute root translations, while many state-of-the-art human pose estimation methods do not provide the root translations.…”

Section: B Implementation Details For Preprocessingmentioning

confidence: 99%

“…To tackle the above challenges, we propose Global Occlusion-Aware Human Mesh Recovery (GLAMR), which can handle severe occlusions and estimate human meshes in consistent global coordinates -even for videos recorded with dynamic cameras. We start by using off-theshelf methods (e.g., KAMA [33] or SPEC [47]) to estimate the shape and pose sequences (motions) of visible people in the camera coordinates. These methods also rely on multiobject tracking and re-identification, which provide occlusion information, and the motion of occluded frames is not estimated.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic Cameras

Yuan¹,

Iqbal²,

Molchanov³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Evaluation Of Glamrmentioning

confidence: 99%

Section: B Implementation Details For Preprocessingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic Cameras

Yuan¹,

Iqbal²,

Molchanov³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

Semantic Synthesis of Pedestrian Locomotion

Priisalu

Păduraru

Pirinen

et al. 2021

Computer Vision – ACCV 2020

View full text Add to dashboard Cite

It is difficult to perform 3D reconstruction from on-vehicle gathered video due to the large forward motion of the vehicle. Even object detection and human sensing models perform significantly worse on onboard videos when compared to standard benchmarks because objects often appear far away from the camera compared to the standard object detection benchmarks, image quality is often decreased by motion blur and occlusions occur often. This has led to the popularisation of traffic data-specific benchmarks. Recently Light Detection And Ranging (LiDAR) sensors have become popular to directly estimate depths without the need to perform 3D reconstructions. However, LiDAR-based methods still lack in articulated human detection at a distance when compared to image-based methods. We hypothesize that benchmarks targeted at articulated human sensing from LiDAR data could bring about increased research in human sensing and prediction in traffic and could lead to improved traffic safety for pedestrians.

show abstract

From Synthetic to One-Shot Regression of Camera-Agnostic Human Performances

Habekost

Pang

Shiratori

et al. 2022

Pattern Recognition and Artificial Intelligence

View full text Add to dashboard Cite

Capturing accurate 3D human performances in global space from a static monocular video is an ill-posed problem. It requires solving various depth ambiguities and information about the camera's intrinsics and extrinsics. Therefore, most methods either learn on given cameras or require to know the camera's parameters. We instead show that a camera's extrinsics and intrinsics can be regressed jointly with human's position in global space, joint angles and body shape only from long sequences of 2D motion estimates. We exploit a static camera's constant parameters by training a model that can be applied to sequences with arbitrary length with only a single forward pass while allowing full bidirectional information flow. We show that full temporal information flow is especially necessary when improving consistency through an adversarial network. Our training dataset is exclusively synthetic, and no domain adaptation is used. We achieve one of the best Human3.6M joint's error performances for models that do not use the Human3.6M training data.

show abstract

SPEC: Seeing People in the Wild with an Estimated Camera

Cited by 78 publications

References 53 publications

GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic Cameras

GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic Cameras

Semantic Synthesis of Pedestrian Locomotion

From Synthetic to One-Shot Regression of Camera-Agnostic Human Performances

Contact Info

Product

Resources

About