Enhanced 3D Human Pose Estimation from Videos by Using Attention-Based Neural Network with Dilated Convolutions

Liu, Ruixu; Shen, Ju; Wang, He; Chen, Chen; Cheung, Susan; Asari, Vijayan K.

doi:10.1007/s11263-021-01436-0

Cited by 20 publications

(6 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Sequential networks are used in pose estimation to ‘lift’ an estimated 2D pose to 3D [ 30 , 31 ]. Recent research combines temporal information with lifting to improve accuracy during frames where one limb occludes another in the view of the camera (self-occlusion) or where not all key points were detected [ 15 , 32 , 33 ]. In contrast to CNNs, these sequential networks do not take a single frame as input but exploit temporal dependencies in the data for their prediction.…”

Section: Methodsmentioning

confidence: 99%

Towards Single Camera Human 3D-Kinematics

Bittner

Yang

Zhang

et al. 2022

Sensors

View full text Add to dashboard Cite

Markerless estimation of 3D Kinematics has the great potential to clinically diagnose and monitor movement disorders without referrals to expensive motion capture labs; however, current approaches are limited by performing multiple de-coupled steps to estimate the kinematics of a person from videos. Most current techniques work in a multi-step approach by first detecting the pose of the body and then fitting a musculoskeletal model to the data for accurate kinematic estimation. Errors in training data of the pose detection algorithms, model scaling, as well the requirement of multiple cameras limit the use of these techniques in a clinical setting. Our goal is to pave the way toward fast, easily applicable and accurate 3D kinematic estimation . To this end, we propose a novel approach for direct 3D human kinematic estimation D3KE from videos using deep neural networks. Our experiments demonstrate that the proposed end-to-end training is robust and outperforms 2D and 3D markerless motion capture based kinematic estimation pipelines in terms of joint angles error by a large margin (35% from 5.44 to 3.54 degrees). We show that D3KE is superior to the multi-step approach and can run at video framerate speeds. This technology shows the potential for clinical analysis from mobile devices in the future.

show abstract

Section: Methodsmentioning

confidence: 99%

Towards Single Camera Human 3D-Kinematics

Bittner

Yang

Zhang

et al. 2022

Sensors

View full text Add to dashboard Cite

show abstract

“…The classification of body posture construction using the K-NN method, although more straightforward than the application of facial recognition, has accurate results [25]. Deep learning [22], multi-scale temporal features, spatio-temporal KCS pose differentiation, and occlusion data augmentation [29] have been used for the 2D to 3D development of human pose estimation [30,31]. Other methods use attention models [32] and multi-scale networks with phase inference optimization [33], introducing many parameters requiring manual tuning.…”

Section: Related Workmentioning

confidence: 99%

“…Other methods use attention models [32] and multi-scale networks with phase inference optimization [33], introducing many parameters requiring manual tuning. The performance of the graphical model-based approach has been surpassed by convolutional neural networks (CNNs) [31,34].…”

Section: Related Workmentioning

confidence: 99%

Pose Detection and Recurrent Neural Networks for Monitoring Littering Violations

Husni,

Felia,

Abdurrahman

et al. 2023

Eng

View full text Add to dashboard Cite

Infrastructure development requires various considerations to maintain its continuity. Some public facilities cannot survive due to human indifference and irresponsible actions. Unfortunately, the government has to spend a lot of money, effort, and time to repair the damage. One of the destructive behaviors that can have an impact on infrastructure and environmental problems is littering. Therefore, this paper proposes a device as an alternative for catching littering rule violators. The proposed device can be used to monitor littering and provide warnings to help officers responsible for capturing the violators. In this innovation, the data obtained by the camera are sent to a mini-PC. The device will send warning information to a mobile phone when someone litters. Then, a speaker will turn on and issue a sound warning: “Do not litter”. The device uses pose detection and a recurrent neural network (RNN) to recognize a person’s activity. All activities can be monitored in a more distant place using IoT technology. In addition, this tool can also monitor environmental conditions and replace city guards to monitor the area. Thus, the municipality can save money and time.

show abstract

“…Human pose estimation Human pose estimation has attracted a lot of research interests in recent years (Yi, Zhou, and Xu 2021;Benzine et al 2021;Xu and Takano 2021;Gong, Zhang, and Feng 2021;Yuan et al 2021). In general, existing human pose estimation methods can be divided into two categories: bottom-up methods (Cao et al 2017;Kocabas, Karagoz, and Akbas 2018;Kreiss, Bertoni, and Alahi 2019;Li et al 2019a;Liu et al 2021a) and topdown methods (Fang et al 2017;Xiao, Wu, and Wei 2018;Wei et al 2016;Sun et al 2019;Moon, Chang, and Lee 2019;Benzine et al 2021). Human pose estimation works have been deployed into many applications such as digital human driven.…”

Section: Related Workmentioning

confidence: 99%

Human Pose Driven Object Effects Recommendation

Fan¹,

Li²,

Liu³

et al. 2022

Preprint

View full text Add to dashboard Cite

In this paper, we research the new topic of object effects recommendation in micro-video platforms, which is a challenging but important task for many practical applications such as advertisement insertion. To avoid the problem of introducing background bias caused by directly learning video content from image frames, we propose to utilize the meaningful body language hidden in 3D human pose for recommendation. To this end, in this work, a novel human pose driven object effects recommendation network termed PoseRec is introduced. PoseRec leverages the advantages of 3D human pose detection and learns information from multi-frame 3D human pose for video-item registration, resulting in high quality object effects recommendation performance. Moreover, to solve the inherent ambiguity and sparsity issues that exist in object effects recommendation, we further propose a novel item-aware implicit prototype learning module and a novel pose-aware transductive hard-negative mining module to better learn pose-item relationships. What's more, to benchmark methods for the new research topic, we build a new dataset for object effects recommendation named Pose-OBE. Extensive experiments on Pose-OBE demonstrate that our method can achieve superior performance than strong baselines.

show abstract

Enhanced 3D Human Pose Estimation from Videos by Using Attention-Based Neural Network with Dilated Convolutions

Cited by 20 publications

References 49 publications

Towards Single Camera Human 3D-Kinematics

Towards Single Camera Human 3D-Kinematics

Pose Detection and Recurrent Neural Networks for Monitoring Littering Violations

Human Pose Driven Object Effects Recommendation

Contact Info

Product

Resources

About