2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00796
|View full text |Cite
|
Sign up to set email alerts
|

PoseFix: Model-Agnostic General Human Pose Refinement Network

Abstract: Multi-person pose estimation from a 2D image is an essential technique for human behavior understanding. In this paper, we propose a human pose refinement network that estimates a refined pose from a tuple of an input image and input pose. The pose refinement was performed mainly through an end-to-end trainable multi-stage architecture in previous methods. However, they are highly dependent on pose estimation models and require careful model design. By contrast, we propose a model-agnostic pose refinement meth… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
99
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 153 publications
(109 citation statements)
references
References 28 publications
0
99
0
Order By: Relevance
“…In the area of object detection, Huang et al [75] show that region-proposed methods (e.g., Faster-rcnn [76]) achieve higher accuracy, while single-shot methods (e.g., YOLO [77], SSD [74]) present higher runtime performance. Analogously in human pose estimation, we observe that top-down approaches also present higher accuracy but lower speed [78] show that refinement over our original work in [3] (by applying a larger cropped image patch) results in a higher accuracy boost than refinement over other top-down approaches. As hardware gets faster and increases its memory, bottom-up methods with higher resolution might be able to reduce the accuracy gap with respect to top-down approaches.…”
Section: Trade-off Between Speed and Accuracymentioning
confidence: 55%
“…In the area of object detection, Huang et al [75] show that region-proposed methods (e.g., Faster-rcnn [76]) achieve higher accuracy, while single-shot methods (e.g., YOLO [77], SSD [74]) present higher runtime performance. Analogously in human pose estimation, we observe that top-down approaches also present higher accuracy but lower speed [78] show that refinement over our original work in [3] (by applying a larger cropped image patch) results in a higher accuracy boost than refinement over other top-down approaches. As hardware gets faster and increases its memory, bottom-up methods with higher resolution might be able to reduce the accuracy gap with respect to top-down approaches.…”
Section: Trade-off Between Speed and Accuracymentioning
confidence: 55%
“…[28] presents a network to simultaneously output keypoint detections and the corresponding keypoint group assignments. [31] designs a feedback architecture that combining the keypoint results of other pose estimation methods with the original image as the new input to the human pose estimation network. In our analysis we consider 8 state-of-the-art multi-person pose estimation methods, which are listed in Table 2.…”
Section: Data Annotationmentioning
confidence: 99%
“…Method AP AP 0.5 AP 0.75 AP M AP L Input Size Runtime Top-down HRNet [7] 0.753 0.925 0.825 0.723 0.803 384x288 0.049 * Xiao [24] 0.723 0.915 0.803 0.695 0.768 256x192 0.110 RMPE [22] 0.735 0.887 0.802 0.693 0.799 320x256 0.298 Bottom-up PAF [9] 0.469 0.737 0.493 0.403 0.561 432x368 0.081 Osokin [10] 0.400 0.659 0.407 0.338 0.494 368x368 0.481 PifPaf [30] 0.630 0.855 0.691 0.603 0.677 401x401 0.202 AE [28] 0.566 0.818 0.618 0.498 0.670 512x512 0.260 PoseFix [31] 0.411 0.647 0.412 0.303 0.559 384x288 0.250 * : without human detection algorithms are lower than top-down methods. After detailed analysis, we find that the numbers of predicted effective keypoints of bottom-up methods are around 10 times less than top-down methods as illustrated in Fig.…”
Section: Typementioning
confidence: 99%
See 2 more Smart Citations