Deep Learning-based Human Pose Estimation: A Survey

Zheng, Ce; Wu, Wenhan; Chen, Chen; Yang, Taojiannan; Zhu, Sijie; Shen, Ju; Kehtarnavaz, Nasser; Shah, Mubarak

doi:10.1145/3603618

Cited by 121 publications

(54 citation statements)

References 177 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Human pose estimation, like animal pose estimation, is most commonly approached using supervised heatmap regression on a frame-by-frame basis [49]. Unlike the animal setting, human models are trained on much larger labeled datasets containing either annotated images [50] or 3D motion capture [51].…”

Section: Discussionmentioning

confidence: 99%

“…In contrast, most animal pose estimation must contend with relatively scarce labels, lower quality videos, and bespoke sets of labels to track, varying by species and lab. Though human pose estimation models can impressively track crowds of moving humans, doing downstream science using the keypoints still presents several challenges [49] similar to those discussed in the Results. Lightning Pose can be applied to single-human pose estimation, by fine-tuning an off-the-shelf human pose estimation backbone to specific experimental setups (such as patients in a clinic), while enforcing our spatiotemporal constraints (or new ones).…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Lightning Pose: improved animal pose estimation via semi-supervised learning, Bayesian ensembling, and cloud-native open-source tools

Biderman

Whiteway

Hurwitz

et al. 2023

Preprint

View full text Add to dashboard Cite

Pose estimation algorithms are shedding new light on animal behavior and intelligence. Most existing models are only trained with labeled frames (supervised learning). Although effective in many cases, the fully supervised approach requires extensive image labeling, struggles to generalize to new videos, and produces noisy outputs that hinder downstream analyses. We address each of these limitations with a semi-supervised approach that leverages the spatiotemporal statistics of unlabeled videos in two different ways. First, we introduce unsupervised training objectives that penalize the network whenever its predictions violate smoothness of physical motion, multiple-view geometry, or depart from a low-dimensional subspace of plausible body configurations. Second, we design a new network architecture that predicts pose for a given frame using temporal context from surrounding unlabeled frames. These context frames help resolve brief occlusions or ambiguities between nearby and similar-looking body parts. The resulting pose estimation networks achieve better performance with fewer labels, generalize better to unseen videos, and provide smoother and more reliable pose trajectories for downstream analysis; for example, these improved pose trajectories exhibit stronger correlations with neural activity. We also propose a Bayesian post-processing approach based on deep ensembling and Kalman smoothing that further improves tracking accuracy and robustness. We release a deep learning package that adheres to industry best practices, supporting easy model development and accelerated training and prediction. Our package is accompanied by a cloud application that allows users to annotate data, train networks, and predict new videos at scale, directly from the browser.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Lightning Pose: improved animal pose estimation via semi-supervised learning, Bayesian ensembling, and cloud-native open-source tools

Biderman

Whiteway

Hurwitz

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…The survey by Holte et al provides a comparative analysis of early works (before 2012) in human pose estimation and activity recognition from multi-view videos (Holte et al, 2012). A recent comprehensive survey explores Human Pose Estimation (HPE) by categorizing methods based on 2D or 3D scenarios, single-view or multi-view approaches, and diverse data sources, employing various learning paradigms C. Zheng (2023). However, this survey primarily focuses on reviewing different approaches and their accuracy results, without delving into common methodologies.…”

Section: Existing Survyesmentioning

confidence: 99%

Unveiling the Landscape of Human Pose Estimation

Yang,

Dasgupta,

Liu

et al. 2024

Preprint

Self Cite

View full text Add to dashboard Cite

This paper presents a comprehensive survey and methodology for deep learning-based solutions in articulated human pose estimation (HPE). Recent advances in deep learning have revolutionized the HPE field, with the capturing system transitioning from multi-modal to a regular color camera and from multi-views to a monocular view, opening up numerous applications. However, the increasing variety of deep network architectures has resulted in a vast literature on the topic, making it challenging to identify commonalities and differences among 1 diverse HPE approaches. Therefore, this paper serves two objectives: firstly, it provides a thorough survey of over 100 research papers published since 2015, focusing on deep learning-based solutions for monocular HPE; secondly, it develops a comprehensive methodology that systematically combines existing works and summarizes a unified framework for the HPE problem and its modular components. Unlike previous surveys, this study places emphasis on methodology development in order to provide betters insights and learning opportunities for researchers in the field of computer vision. The paper also summarizes and discusses the quantitative performance of the reviewed methods on popular datasets, while highlighting the challenges involved, such as occlusion and viewpoint variation. Finally, future research directions, such as incorporating temporal information and 3D pose estimation, along with potential solutions to address the remaining challenges in HPE, are presented.

show abstract

“…Reimers et al (2012) [1] utilized open pose for pose recognition, while Muhammad Usama et al (2017) [4] employed Microsoft Kinect to obtain real-time human joint points for identifying yoga poses. Shih-En et al (2016) [5] proposed an architecture leveraging multiple convolutional networks to enhance joint estimates, and Bogo et al (2016) [7] estimated 3D pose and mesh shape from a single RGB image.…”

Section: Literature Reviewmentioning

confidence: 99%

GYM Grace Tracker

2023

IRJMETS

View full text Add to dashboard Cite

Inactivity is a major contributor to global obesity, impacting people worldwide Fitness plays a pivotal role in maintaining a healthy lifestyle and is a significant gauge of one's health-related quality of life. While a fitness trainer can motivate and guide daily exercise, the associated costs and setting constraints pose challenges. Independent workouts, though beneficial, can be ineffective and dangerous without proper supervision, leading to fatal mistakes. The Gym Grace Tracker, a groundbreaking web application, integrates cutting-edge AI into fitness, addressing the challenges of prioritizing physical well-being in today's demanding world. Designed for all fitness levels, it offers a revolutionary solution amid the surge in home-based fitness due to sedentary lifestyles and the COVID-19 pandemic. The Gym Grace Tracker provides a comprehensive and accessible tool, mitigating the increased risk of exercise-related injuries by responding to the lack of professional guidance.

show abstract

Deep Learning-based Human Pose Estimation: A Survey

Cited by 121 publications

References 177 publications

Lightning Pose: improved animal pose estimation via semi-supervised learning, Bayesian ensembling, and cloud-native open-source tools

Lightning Pose: improved animal pose estimation via semi-supervised learning, Bayesian ensembling, and cloud-native open-source tools

Unveiling the Landscape of Human Pose Estimation

GYM Grace Tracker

Contact Info

Product

Resources

About