CrowdPose: Efficient Crowded Scenes Pose Estimation and a New Benchmark

Li, Jiefeng; Wang, Can; Zhu, Hao; Mao, Yihuan; Fang, Hao-Shu; Lu, Cewu

doi:10.1109/cvpr.2019.01112

Cited by 431 publications

(259 citation statements)

References 43 publications

Supporting

Mentioning

236

Contrasting

Order By: Relevance

“…Each of them comes with different advantages and disadvantages. Although bottom-up methods seem to be better suited for crowded scenes since they process whole images and their runtime is thus less dependent of the actual number of persons, literature shows that top-down methods like [6] and [11] perform comparably well on publicly available datasets.…”

Section: Top-down and Bottom-up Methodsmentioning

confidence: 99%

“…However, some recent works focus on this subject. Li et al [6] introduce a new benchmark for evaluating pose estimation methods for this problem and proposed a method where human bounding box proposals obtained by human detector are fed into joint-candidate single person pose estimator (JCSPPE). JC-SPPE locates the joint candidates with different response scores on the heatmap.…”

Section: Further Work On Crowd Pose Estimationmentioning

confidence: 99%

“…Using Grand Theft Auto V, a widely known computer game with an active modding community, they created a synthetic dataset of humans annotated with highly accurate keypoint information along-side with a tool that allows to easily generate own synthetic datasets. Both, [3] and [6] proposed interesting datasets, which we used for our experiments, whereas [3] is more similar to surveillance situations than the dataset introduced by [6].…”

Section: Further Work On Crowd Pose Estimationmentioning

confidence: 99%

“…Another goal in creating an extension for the JTA dataset was to increase the share of highly crowded scenarios. In order to gauge the "crowdedness" of the datasets we utilized the CrowdIndex proposed by Li et al [6]. The different CrowdIndex distributions are presented in Fig.…”

Section: Jta-extmentioning

confidence: 99%

See 3 more Smart Citations

Human Pose Estimation for Real-World Crowded Scenarios

Golda

Kalb

Schumann

et al. 2019

2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)

View full text Add to dashboard Cite

Human pose estimation has recently made significant progress with the adoption of deep convolutional neural networks and many applications have attracted tremendous interest in recent years. However, many of these applications require pose estimation for human crowds, which still is a rarely addressed problem. For this purpose this work explores methods to optimize pose estimation for human crowds, focusing on challenges introduced with larger scale crowds like people in close proximity to each other, mutual occlusions, and partial visibility of people due to the environment. In order to address these challenges, multiple approaches are evaluated including: the explicit detection of occluded body parts, a data augmentation method to generate occlusions and the use of the synthetic generated dataset JTA [3]. In order to overcome the transfer gap of JTA originating from a low pose variety and less dense crowds, an extension dataset is created to ease the use for real-world applications.

show abstract

Section: Top-down and Bottom-up Methodsmentioning

confidence: 99%

Section: Further Work On Crowd Pose Estimationmentioning

confidence: 99%

Section: Further Work On Crowd Pose Estimationmentioning

confidence: 99%

Section: Jta-extmentioning

confidence: 99%

See 2 more Smart Citations

Human Pose Estimation for Real-World Crowded Scenarios

Golda

Kalb

Schumann

et al. 2019

2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)

View full text Add to dashboard Cite

show abstract

“…Hierarchical/Graphical Models in Computer Vision: Hierarchical/graphical models are powerful for building structured representations, which can reflect task-specific relations and constraints. From early distributional semantic models, part-based models [16,17], MRF/CRF [31], And-Or grammar model [59], to deep structural networks [30,15], graph neural networks [20], trainable CRF [79], etc., hierarchical/graphical models have found applications in a wide variety of core computer vision tasks, such as object recognition [55], human parsing [40,41,81], pose estimation [34,66,61,68,35], visual dialog etc., to the extent that they are now ubiquitous in the field. Inspired by their general success, we leverage structural information to design our approach.…”

Section: Related Workmentioning

confidence: 99%

Learning Compositional Neural Information Fusion for Human Parsing

Wang

Zhang

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

124

View full text Add to dashboard Cite

This work proposes to combine neural networks with the compositional hierarchy of human bodies for efficient and complete human parsing. We formulate the approach as a neural information fusion framework. Our model assembles the information from three inference processes over the hierarchy: direct inference (directly predicting each part of a human body using image information), bottom-up inference (assembling knowledge from constituent parts), and top-down inference (leveraging context from parent nodes). The bottom-up and top-down inferences explicitly model the compositional and decompositional relations in human bodies, respectively. In addition, the fusion of multi-source information is conditioned on the inputs, i.e., by estimating and considering the confidence of the sources. The whole model is end-to-end differentiable, explicitly modeling information flows and structures. Our approach is extensively evaluated on four popular datasets, outperforming the state-of-the-arts in all cases, with a fast processing speed of 23fps. Our code and results have been released to help ease future research in this direction. * Equal contribution. † Corresponding author: Yanwei Pang.

show abstract

Exploiting Human Pose for Weakly-Supervised Temporal Action Localization

Zhu

2019

Pattern Recognition and Computer Vision

View full text Add to dashboard Cite

CrowdPose: Efficient Crowded Scenes Pose Estimation and a New Benchmark

Cited by 431 publications

References 43 publications

Human Pose Estimation for Real-World Crowded Scenarios

Human Pose Estimation for Real-World Crowded Scenarios

Learning Compositional Neural Information Fusion for Human Parsing

Exploiting Human Pose for Weakly-Supervised Temporal Action Localization

Contact Info

Product

Resources

About