3D human pose and shape estimation via de-occlusion multi-task learning

Ran, Hang; Ning, Xin; Li, Weijun; Hao, Ming; Tiwari, Prayag

doi:10.1016/j.neucom.2023.126284

Cited by 14 publications

(7 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(d) Multi-modal question answering: Future question-answering systems need to support multiple forms of input, such as text, images, and voice [69], to meet the diverse needs of users. For example, Ning et al [70] proposed a novel method called Differentiable Image-Language Fusion (DILF) for multi-view image and language fusion.…”

Section: Trends and Conclusionmentioning

confidence: 99%

Advancements in Complex Knowledge Graph Question Answering: A Survey

Song,

Li,

Dai

et al. 2023

Electronics

View full text Add to dashboard Cite

Complex Question Answering over Knowledge Graph (C-KGQA) seeks to solve complex questions using knowledge graphs. Currently, KGQA systems achieve great success in answering simple questions, while complex questions still present challenging issues. As a result, an increasing number of novel methods have been proposed to remedy this challenge. In this survey, we proposed two mainstream categories of methods for C-KGQA, which are divided according to their use for knowledge graph representation and construction, namely, graph metric (GM)-Based Methods and graph neural network (GNN)-based methods. Additionally, we also acknowledge the influence of ChatGPT, which has prompted further research into utilizing knowledge graphs as a knowledge source to assist in answering complex questions. We also introduced methods based on pre-trained models and knowledge graph joint reasoning. Furthermore, we have compiled research achievements from the past three years to make it easier for researchers with similar interests to obtain state-of-the-art research. Finally, we discussed the resources and evaluation methods for tackling C-KGQA tasks and summarized several research prospects in this field.

show abstract

Section: Trends and Conclusionmentioning

confidence: 99%

Advancements in Complex Knowledge Graph Question Answering: A Survey

Song,

Li,

Dai

et al. 2023

Electronics

View full text Add to dashboard Cite

show abstract

“…The average accuracy of predicting key points is the highest when introducing a regression module based on the anchor pose and a module fused with 3D pose data input, with only a few keypoints showing little decrease. Overall, the FPNet has the highest prediction accuracy, but there is still room for improvement [45]. 7, exhibit significant attitude changes and scale variances.…”

Section: Performance Between Multiple Datasetsmentioning

confidence: 99%

High Speed and Accuracy of Animation 3D Pose Recognition Based on an Improved Deep Convolution Neural Network

Ding

2023

Applied Sciences

View full text Add to dashboard Cite

Pose recognition in character animations is an important avenue of research in computer graphics. However, the current use of traditional artificial intelligence algorithms to recognize animation gestures faces hurdles such as low accuracy and speed. Therefore, to overcome the above problems, this paper proposes a real-time 3D pose recognition system, which includes both facial and body poses, based on deep convolutional neural networks and further designs a single-purpose 3D pose estimation system. First, we transformed the human pose extracted from the input image to an abstract pose data structure. Subsequently, we generated the required character animation at runtime based on the transformed dataset. This challenges the conventional concept of monocular 3D pose estimation, which is extremely difficult to achieve. It can also achieve real-time running speed at a resolution of 384 fps. The proposed method was used to identify multiple-character animation using multiple datasets (Microsoft COCO 2014, CMU Panoptic, Human3.6M, and JTA). The results indicated that the improved algorithm improved the recognition accuracy and performance by approximately 3.5% and 8–10 times, respectively, which is significantly superior to other classic algorithms. Furthermore, we tested the proposed system on multiple pose-recognition datasets. The 3D attitude estimation system speed can reach 24 fps with an error of 100 mm, which is considerably less than that of the 2D attitude estimation system with a speed of 60 fps. The pose recognition based on deep learning proposed in this study yielded surprisingly superior performance, proving that the use of deep-learning technology for image recognition has great potential.

show abstract

“…However, despite the remarkable achievements made in the field of robot music performance, there still exist several limitations and unresolved issues in existing research (Ran et al, 2023). Firstly, current robot music performances face challenges in the realm of multi-modal fusion.…”

Section: Related Workmentioning

confidence: 99%

Multi-dimensional fusion: transformer and GANs-based multimodal audiovisual perception robot for musical performance art

Lu,

Wang

2023

Front. Neurorobot.

View full text Add to dashboard Cite

IntroductionIn the context of evolving societal preferences for deeper emotional connections in art, this paper explores the emergence of multimodal robot music performance art. It investigates the fusion of music and motion in robot performances to enhance expressiveness and emotional impact. The study employs Transformer models to combine audio and video signals, enabling robots to better understand music's rhythm, melody, and emotional content. Generative Adversarial Networks (GANs) are utilized to create lifelike visual performances synchronized with music, bridging auditory and visual perception. Multimodal reinforcement learning is employed to achieve harmonious alignment between sound and motion.MethodsThe study leverages Transformer models to process audio and video signals in robot performances. Generative Adversarial Networks are employed to generate visually appealing performances that align with the musical input. Multimodal reinforcement learning is used to synchronize robot actions with music. Diverse music styles and emotions are considered in the experiments. Performance evaluation metrics include accuracy, recall rate, and F1 score.ResultsThe proposed approach yields promising results across various music styles and emotional contexts. Performance smoothness scores exceed 94 points, demonstrating the fluidity of robot actions. An accuracy rate of 95% highlights the precision of the system in aligning robot actions with music. Notably, there is a substantial 33% enhancement in performance recall rate compared to baseline modules. The collective improvement in F1 score emphasizes the advantages of the proposed approach in the realm of robot music performance art.DiscussionThe study's findings demonstrate the potential of multimodal robot music performance art in achieving heightened emotional impact. By combining audio and visual cues, robots can better interpret and respond to music, resulting in smoother and more precise performances. The substantial improvement in recall rate suggests that the proposed approach enhances the robots' ability to accurately mirror the emotional nuances of the music. These results signify the potential of this approach to transform the landscape of artistic expression through robotics, opening new avenues for emotionally resonant performances.

show abstract

3D human pose and shape estimation via de-occlusion multi-task learning

Cited by 14 publications

References 15 publications

Advancements in Complex Knowledge Graph Question Answering: A Survey

Advancements in Complex Knowledge Graph Question Answering: A Survey

High Speed and Accuracy of Animation 3D Pose Recognition Based on an Improved Deep Convolution Neural Network

Multi-dimensional fusion: transformer and GANs-based multimodal audiovisual perception robot for musical performance art

Contact Info

Product

Resources

About