PoseFix: Model-Agnostic General Human Pose Refinement Network

Moon, Gyeongsik; Chang, Ju Yong; Lee, Kyoung Mu

doi:10.1109/cvpr.2019.00796

Cited by 153 publications

(109 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the area of object detection, Huang et al [75] show that region-proposed methods (e.g., Faster-rcnn [76]) achieve higher accuracy, while single-shot methods (e.g., YOLO [77], SSD [74]) present higher runtime performance. Analogously in human pose estimation, we observe that top-down approaches also present higher accuracy but lower speed [78] show that refinement over our original work in [3] (by applying a larger cropped image patch) results in a higher accuracy boost than refinement over other top-down approaches. As hardware gets faster and increases its memory, bottom-up methods with higher resolution might be able to reduce the accuracy gap with respect to top-down approaches.…”

Section: Trade-off Between Speed and Accuracymentioning

confidence: 55%

OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields

Cao

Hidalgo

Simon

et al. 2021

IEEE Trans. Pattern Anal. Mach. Intell.

3,265

2,155

View full text Add to dashboard Cite

Realtime multi-person 2D pose estimation is a key component in enabling machines to have an understanding of people in images and videos. In this work, we present a realtime approach to detect the 2D pose of multiple people in an image. The proposed method uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. This bottom-up system achieves high accuracy and realtime performance, regardless of the number of people in the image. In previous work, PAFs and body part location estimation were refined simultaneously across training stages. We demonstrate that a PAF-only refinement rather than both PAF and body part location refinement results in a substantial increase in both runtime performance and accuracy. We also present the first combined body and foot keypoint detector, based on an internal annotated foot dataset that we have publicly released. We show that the combined detector not only reduces the inference time compared to running them sequentially, but also maintains the accuracy of each component individually. This work has culminated in the release of OpenPose, the first open-source realtime system for multi-person 2D pose detection, including body, foot, hand, and facial keypoints.Index Terms-2D human pose estimation, 2D foot keypoint estimation, real-time, multiple person, part affinity fields.

show abstract

Section: Trade-off Between Speed and Accuracymentioning

confidence: 55%

OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields

Cao

Hidalgo

Simon

et al. 2021

IEEE Trans. Pattern Anal. Mach. Intell.

3,265

2,155

View full text Add to dashboard Cite

show abstract

“…[28] presents a network to simultaneously output keypoint detections and the corresponding keypoint group assignments. [31] designs a feedback architecture that combining the keypoint results of other pose estimation methods with the original image as the new input to the human pose estimation network. In our analysis we consider 8 state-of-the-art multi-person pose estimation methods, which are listed in Table 2.…”

Section: Data Annotationmentioning

confidence: 99%

“…Method AP AP 0.5 AP 0.75 AP M AP L Input Size Runtime Top-down HRNet [7] 0.753 0.925 0.825 0.723 0.803 384x288 0.049 * Xiao [24] 0.723 0.915 0.803 0.695 0.768 256x192 0.110 RMPE [22] 0.735 0.887 0.802 0.693 0.799 320x256 0.298 Bottom-up PAF [9] 0.469 0.737 0.493 0.403 0.561 432x368 0.081 Osokin [10] 0.400 0.659 0.407 0.338 0.494 368x368 0.481 PifPaf [30] 0.630 0.855 0.691 0.603 0.677 401x401 0.202 AE [28] 0.566 0.818 0.618 0.498 0.670 512x512 0.260 PoseFix [31] 0.411 0.647 0.412 0.303 0.559 384x288 0.250 * : without human detection algorithms are lower than top-down methods. After detailed analysis, we find that the numbers of predicted effective keypoints of bottom-up methods are around 10 times less than top-down methods as illustrated in Fig.…”

Section: Typementioning

confidence: 99%

“…In this dataset, the precision of [10] decreases by 13% in AP 0.5 compared with [9], which indicates that the generality of [10] is also narrowed. We use the results of [9] as the initial poses of [31]. Through pose refinement, [31] improved the pose estimation results by 0.4%.…”

Section: Typementioning

confidence: 99%

“…We use the results of [9] as the initial poses of [31]. Through pose refinement, [31] improved the pose estimation results by 0.4%.…”

Section: Typementioning

confidence: 99%

See 2 more Smart Citations

FollowMeUp Sports: New Benchmark for 2D Human Keypoint Recognition

Huang

Sun²,

Kan³

et al. 2019

Pattern Recognition and Computer Vision

View full text Add to dashboard Cite

Human pose estimation has made significant advancement in recent years. However, the existing datasets are limited in their coverage of pose variety. In this paper, we introduce a novel benchmark "Fol-lowMeUp Sports" that makes an important advance in terms of specific postures, self-occlusion and class balance, a contribution that we feel is required for future development in human body models. This comprehensive dataset was collected using an established taxonomy of over 200 standard workout activities with three different shot angles. The collected videos cover a wider variety of specific workout activities than previous datasets including push-up, squat and body moving near the ground with severe self-occlusion or occluded by some sport equipment and outfits. Given these rich images, we perform a detailed analysis of the leading human pose estimation approaches gaining insights for the success and failures of these methods.

show abstract

Personalized motion kernel learning for human pose estimation

Wang

Feng

Chen

et al. 2022

Int J of Intelligent Sys

View full text Add to dashboard Cite

Estimating human poses from a video is at the foundation of many visual intelligent systems. Various convolutional neural networks have been proposed, achieving state‐of‐the‐art performance on different image datasets. However, most existing approaches are image based, which deliver unreliable estimations on videos since they fail to model temporal consistency across video frames. Recently, another line of work leverages temporal cues for multi‐frame person pose estimation, yet still in an instance‐unaware fashion, disregarding the specific traits of different instances (persons) or different joints. In this paper, we propose a novel approach to learn specific keypoint motion representations for each person, termed Personalized Motion‐Aware Network (PMAN). In the PMAN, we devise three components: (i) an Instance‐Sensitive Extractor that adaptively computes the spatial features according to human physical characteristics; (ii) a Keypoint Motion Encoder that separately generates convolution kernels with fine‐grained keypoint motion encoding; (iii) a Motion Driven Decoder that parses multi‐frame spatial features of the same person to provide precise human pose estimations. Extensive experiments on PoseTrack2017 and PoseTrack2018 datasets demonstrate that our approach greatly improves the performance of multi‐frame human pose estimation. It is worth mentioning that our approach surpasses the state‐of‐the‐art method by +1.7 mAP and achieves 82.9 mAP on PoseTrack2017 dataset.

show abstract

PoseFix: Model-Agnostic General Human Pose Refinement Network

Cited by 153 publications

References 28 publications

OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields

OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields

FollowMeUp Sports: New Benchmark for 2D Human Keypoint Recognition

Personalized motion kernel learning for human pose estimation

Contact Info

Product

Resources

About