Rgb-D Fusion For Point-Cloud-Based 3d Human Pose Estimation

Ying, Jiaming; Zhao, Xu

doi:10.1109/icip42928.2021.9506588

Cited by 7 publications

(2 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Mono‐camera approaches with optimization techniques [BKL*16, KPD19] and neural networks [PZDD17,WLLL22,HPY*22] lack depth information and struggle to track global translations. Despite offering an additional depth channel, RGBD‐based solutions [BMB*11, MSS*17, YZ21] are hindered by limited camera resolution and a field of view (FOV), which makes them impractical for product‐level applications.…”

Section: Related Workmentioning

confidence: 99%

MOVIN: Real‐time Motion Capture using a Single LiDAR

Jang,

Yang,

Jang

et al. 2023

Computer Graphics Forum

View full text Add to dashboard Cite

Recent advancements in technology have brought forth new forms of interactive applications, such as the social metaverse, where end users interact with each other through their virtual avatars. In such applications, precise full‐body tracking is essential for an immersive experience and a sense of embodiment with the virtual avatar. However, current motion capture systems are not easily accessible to end users due to their high cost, the requirement for special skills to operate them, or the discomfort associated with wearable devices. In this paper, we present MOVIN, the data‐driven generative method for real‐time motion capture with global tracking, using a single LiDAR sensor. Our autoregressive conditional variational autoencoder (CVAE) model learns the distribution of pose variations conditioned on the given 3D point cloud from LiDAR. As a central factor for high‐accuracy motion capture, we propose a novel feature encoder to learn the correlation between the historical 3D point cloud data and global, local pose features, resulting in effective learning of the pose prior. Global pose features include root translation, rotation, and foot contacts, while local features comprise joint positions and rotations. Subsequently, a pose generator takes into account the sampled latent variable along with the features from the previous frame to generate a plausible current pose. Our framework accurately predicts the performer's 3D global information and local joint details while effectively considering temporally coherent movements across frames. We demonstrate the effectiveness of our architecture through quantitative and qualitative evaluations, comparing it against state‐of‐the‐art methods. Additionally, we implement a real‐time application to showcase our method in real‐world scenarios. MOVIN dataset is available at https://movin3d.github.io/movin_pg2023/https://movin3d.github.io/movin_pg2023/">https://movin3d.github.io/movin_pg2023/.

show abstract

Section: Related Workmentioning

confidence: 99%

MOVIN: Real‐time Motion Capture using a Single LiDAR

Jang,

Yang,

Jang

et al. 2023

Computer Graphics Forum

View full text Add to dashboard Cite

show abstract

“…Figure 2 shows the idea of the proposed method. While we use PointNet [22]-inspired architecture as the main point cloud processing network, we cannot fuse camera and Li-DAR imagery at the lower levels like in other settings [38] because of the sparsity of LiDAR. We propose a cascade architecture with a CNN-based camera network for 2D pose estimation.…”

Section: Introductionmentioning

confidence: 99%

Multi-modal 3D Human Pose Estimation with 2D Weak Supervision in Autonomous Driving

Zheng¹,

Shi²,

Gorban³

et al. 2021

Preprint

View full text Add to dashboard Cite

3D human pose estimation (HPE) in autonomous vehicles (AV) differs from other use cases in many factors, including the 3D resolution and range of data, absence of dense depth maps, failure modes for LiDAR, relative location between the camera and LiDAR, and a high bar for estimation accuracy. Data collected for other use cases (such as virtual reality, gaming, and animation) may therefore not be usable for AV applications. This necessitates the collection and annotation of a large amount of 3D data for HPE in AV, which is time-consuming and expensive.In this paper, we propose one of the first approaches to alleviate this problem in the AV setting. Specifically, we propose a multi-modal approach which uses 2D labels on RGB images as weak supervision to perform 3D HPE. The proposed multi-modal architecture incorporates LiDAR and camera inputs with an auxiliary segmentation branch. On the Waymo Open Dataset [27], our approach achieves a ∼ 22% relative improvement over camera-only 2D HPE baseline, and ∼ 6% improvement over LiDAR-only model. Finally, careful ablation studies and parts based analysis illustrate the advantages of each of our contributions.

show abstract

C3P: Cross-Domain Pose Prior Propagation for Weakly Supervised 3D Human Pose Estimation

Xiao

Zhang

et al. 2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Rgb-D Fusion For Point-Cloud-Based 3d Human Pose Estimation

Cited by 7 publications

References 17 publications

MOVIN: Real‐time Motion Capture using a Single LiDAR

MOVIN: Real‐time Motion Capture using a Single LiDAR

Multi-modal 3D Human Pose Estimation with 2D Weak Supervision in Autonomous Driving

C3P: Cross-Domain Pose Prior Propagation for Weakly Supervised 3D Human Pose Estimation

Contact Info

Product

Resources

About