Distill Knowledge From NRSfM for Weakly Supervised 3D Pose Learning

Wang, Chaoyang; Kong, Chen; Lucey, Simon

doi:10.1109/iccv.2019.00083

Cited by 49 publications

(29 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(1) extending its annotation modalities, (2) using weaklysupervised learning [34,38,49] to estimate other modalities, (3) using transfer learning and domain adaptation [1,5,22] to transfer knowledge of other modalities from other data domain to our benchmark.…”

Section: Does Depth Information Help ?mentioning

confidence: 99%

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

Sun¹,

Cao²,

Jiang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Figure 1 -Sample images from a video in DanceTrack. The shown images are 1, 66, 307 and 327 frames in DanceTrack0027 video. The emphasized properties of this dataset are (1) uniform appearance: humans are in highly similar and almost undistinguished appearance.(2) diverse motion: they are in complicated motion pattern and interaction. The numbers below show their identification which experiences frequent relative position switches and occlusion as well. We expect the combination of uniform appearance and complicated motion pattern makes DanceTrack a platform to encourage more comprehensive and intelligent multi-object tracking algorithms.

show abstract

Section: Does Depth Information Help ?mentioning

confidence: 99%

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

Sun¹,

Cao²,

Jiang³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Knowledge distillation methods have been widely used in many vision tasks, including object detection [30,6,13], line detection [20], semantic segmentation [62,18,34] and human pose estimation [66,40,56,58]. DOPE [58] proposes to distill the 2D and 3D poses from three independent body part expert models to the single whole-body pose detection model.…”

Section: Related Workmentioning

confidence: 99%

Online Knowledge Distillation for Efficient Pose Estimation

Song

et al. 2021

Preprint

View full text Add to dashboard Cite

Existing state-of-the-art human pose estimation methods require heavy computational resources for accurate predictions. One promising technique to obtain an accurate yet lightweight pose estimator is knowledge distillation, which distills the pose knowledge from a powerful teacher model to a less-parameterized student model. However, existing pose distillation works rely on a heavy pre-trained estimator to perform knowledge transfer and require a complex two-stage learning procedure. In this work, we investigate a novel Online Knowledge Distillation framework by distilling Human Pose structure knowledge in a one-stage manner to guarantee the distillation efficiency, termed OKDHP. Specifically, OKDHP trains a single multi-branch network and acquires the predicted heatmaps from each, which are then assembled by a Feature Aggregation Unit (FAU) as the target heatmaps to teach each branch in reverse. Instead of simply averaging the heatmaps, FAU which consists of multiple parallel transformations with different receptive fields, leverages the multi-scale information, thus obtains target heatmaps with higher-quality. Specifically, the pixelwise Kullback-Leibler (KL) divergence is utilized to minimize the discrepancy between the target heatmaps and the predicted ones, which enables the student network to learn the implicit keypoint relationship. Besides, an unbalanced OKDHP scheme is introduced to customize the student networks with different compression rates. The effectiveness of our approach is demonstrated by extensive experiments on two common benchmark datasets, MPII and COCO.

show abstract

“…Therefore, generalization to in-thewild applications remains challenging. Weakly-supervised methods were proposed to address this problem using unpaired 2D and 3D annotations [38,40,20], limited available Figure 1. Sample predictions of the proposed weakly-supervised method for in-the-wild videos from 3DPW dataset.…”

Section: Introductionmentioning

confidence: 99%

“…However, obtaining such information for the unsupervised learning task is still an obstacle. To the best of our knowledge, there are only a few works that propose weaklysupervised training schemes without using any 3D annotation [18,13,39,40]. [13] and [39] propose multi-view consistency as a supervision while [18] generates pseudo ground-truth 3D poses using epipolar geometry.…”

Section: Introductionmentioning

confidence: 99%

TriPose: A Weakly-Supervised 3D Human Pose Estimation via Triangulation from Video

Gholami¹,

Raza²,

Rhodin³

et al. 2021

Preprint

View full text Add to dashboard Cite

Estimating 3D human poses from video is a challenging problem. The lack of 3D human pose annotations is a major obstacle for supervised training and for generalization to unseen datasets. In this work, we address this problem by proposing a weakly-supervised training scheme that does not require 3D annotations or calibrated cameras. The proposed method relies on temporal information and triangulation. Using 2D poses from multiple views as the input, we first estimate the relative camera orientations and then generate 3D poses via triangulation. The triangulation is only applied to the views with high 2D human joint confidence. The generated 3D poses are then used to train a recurrent lifting network (RLN) that estimates 3D poses from 2D poses. We further apply a multi-view re-projection loss to the estimated 3D poses and enforce the 3D poses estimated from multi-views to be consistent. Therefore, our method relaxes the constraints in practice, only multi-view videos are required for training, and is thus convenient for in-thewild settings. At inference, RLN merely requires single-view videos. The proposed method outperforms previous works on two challenging datasets, Human3.6M and MPI-INF-3DHP. Codes and pretrained models will be publicly available.

show abstract

Distill Knowledge From NRSfM for Weakly Supervised 3D Pose Learning

Cited by 49 publications

References 48 publications

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

Online Knowledge Distillation for Efficient Pose Estimation

TriPose: A Weakly-Supervised 3D Human Pose Estimation via Triangulation from Video

Contact Info

Product

Resources

About