OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields

Cao, Zhe; Hidalgo, Gines; Simon, Tomas; Wei, Shih-En; Sheikh, Yaser

doi:10.1109/tpami.2019.2929257

Cited by 3,268 publications

(2,449 citation statements)

References 69 publications

Supporting

Mentioning

2,155

Contrasting

Unclassified

Order By: Relevance

“…The 2D projections are thereafter grouped into mutually exclusive clusters, and for each cluster, we allocate a representative pose. Then, in the second stage, for an input frame taken from a video stream, we extract in real time its 2D pose using the OpenPose network . The 2D pose is then rescaled so as to be consistent with the 2D projections stored in the database.…”

Section: Methods Overviewmentioning

confidence: 99%

“…The human poses in these CMU 3D data are represented by n =30 joint positions and rotations. In our work, we use 2D poses that are estimated from an input monocular video using OpenPose (see Section 5 for more details) that are represented by m =14 joint locations (see Figure ). Thus, in order to have a uniform and comparable skeleton, we retarget the CMU .bvh data to a 3D skeleton that its projection in T‐pose matches the 2D skeleton (in T‐pose) returned by OpenPose (see Figure for the new skeleton); the 2D pose projections are then scaled so as their bounding box remains constant over time.…”

Section: Motion Databasementioning

confidence: 99%

“…Most papers in the literature estimate the human pose in two‐dimensional (2D) for one or multiple characters by localizing joint keypoints in pixel space or by extracting the shape silhouette and then retrieving the closest neighbor from a database . More recently, and with the advent of deep learning (DL), the community has been moving to learning‐based discriminative methods, where the effectiveness of 2D human pose estimation has greatly improved . The 3D skeletal reconstruction, though, is a much harder problem .…”

Section: Introductionmentioning

confidence: 99%

“…To deal with the limitations of the prior work such as the bone length constraints violations, the simultaneously capturing of multiple characters, and the temporal consistency of the reconstructed skeletons, we generate a database with numerous 2D projections by rotating a small angle at a time, the yaw axis of 3D skeletons. Then, we match the input 2D poses (which are extracted from a single video stream using the OpenPose network) with the projections on the database and retrieve the best 3D skeleton pose that is temporally consistent to the skeleton of the previous frames, producing natural and smooth motion.…”

Section: Introductionmentioning

confidence: 99%

“…[3][4][5] More recently, and with the advent of deep learning (DL), the community has been moving to learning-based discriminative methods, where the effectiveness of 2D human pose estimation has greatly improved. [6][7][8] The 3D skeletal reconstruction, though, is a much harder problem. 9,10 Even though there are methods that are effective at 3D pose reconstruction, they are usually not real-time implementable and suffer from depth and scale ambiguities.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Real‐time 3D human pose and motion reconstruction from monocular RGB videos

Yiannakides

Aristidou

Chrysanthou

2019

Computer Animation & Virtual

View full text Add to dashboard Cite

Real‐time three‐dimensional (3D) pose estimation is of high interest in interactive applications, virtual reality, activity recognition, and most importantly, in the growing gaming industry. In this work, we present a method that captures and reconstructs the 3D skeletal pose and motion articulation of multiple characters using a monocular RGB camera. Our method deals with this challenging, but useful, task by taking advantage of the recent development in deep learning that allows two‐dimensional (2D) pose estimation of multiple characters and the increasing availability of motion capture data. We fit 2D estimated poses, extracted from a single camera via OpenPose, with a 2D multiview joint projections database that is associated with their 3D motion representations. We then retrieve the 3D body pose of the tracked character, ensuring throughout that the reconstructed movements are natural, satisfy the model constraints, are within a feasible set, and are temporally smooth without jitters. We demonstrate the performance of our method in several examples, including human locomotion, simultaneously capturing of multiple characters, and motion reconstruction from different camera views.

show abstract

Section: Methods Overviewmentioning

confidence: 99%

Section: Motion Databasementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Real‐time 3D human pose and motion reconstruction from monocular RGB videos

Yiannakides

Aristidou

Chrysanthou

2019

Computer Animation & Virtual

View full text Add to dashboard Cite

show abstract

Deep Learning Analysis of Surgical Video Recordings to Assess Nontechnical Skills

Harari,

Dias,

Kennedy-Metz

et al. 2024

JAMA Netw Open

View full text Add to dashboard Cite

ImportanceAssessing nontechnical skills in operating rooms (ORs) is crucial for enhancing surgical performance and patient safety. However, automated and real-time evaluation of these skills remains challenging.ObjectiveTo explore the feasibility of using motion features extracted from surgical video recordings to automatically assess nontechnical skills during cardiac surgical procedures.Design, Setting, and ParticipantsThis cross-sectional study used video recordings of cardiac surgical procedures at a tertiary academic US hospital collected from January 2021 through May 2022. The OpenPose library was used to analyze videos to extract body pose estimations of team members and compute various team motion features. The Non-Technical Skills for Surgeons (NOTSS) assessment tool was employed for rating the OR team’s nontechnical skills by 3 expert raters.Main Outcomes and MeasuresNOTSS overall score, with motion features extracted from surgical videos as measures.ResultsA total of 30 complete cardiac surgery procedures were included: 26 (86.6%) were on-pump coronary artery bypass graft procedures and 4 (13.4%) were aortic valve replacement or repair procedures. All patients were male, and the mean (SD) age was 72 (6.3) years. All surgical teams were composed of 4 key roles (attending surgeon, attending anesthesiologist, primary perfusionist, and scrub nurse) with additional supporting roles. NOTSS scores correlated significantly with trajectory (r = 0.51, P = .005), acceleration (r = 0.48, P = .008), and entropy (r = −0.52, P = .004) of team displacement. Multiple linear regression, adjusted for patient factors, showed average team trajectory (adjusted R2 = 0.335; coefficient, 10.51 [95% CI, 8.81-12.21]; P = .004) and team displacement entropy (adjusted R2 = 0.304; coefficient, −12.64 [95% CI, −20.54 to −4.74]; P = .003) were associated with NOTSS scores.Conclusions and RelevanceThis study suggests a significant link between OR team movements and nontechnical skills ratings by NOTSS during cardiac surgical procedures, suggesting automated surgical video analysis could enhance nontechnical skills assessment. Further investigation across different hospitals and specialties is necessary to validate these findings.

show abstract