2021
DOI: 10.1609/aaai.v35i2.16202
|View full text |Cite
|
Sign up to set email alerts
|

Graph and Temporal Convolutional Networks for 3D Multi-person Pose Estimation in Monocular Videos

Abstract: Despite the recent progress, 3D multi-person pose estimation from monocular videos is still challenging due to the commonly encountered problem of missing information caused by occlusion, partially out-of-frame target persons, and inaccurate person detection. To tackle this problem, we propose a novel framework integrating graph convolutional networks (GCNs) and temporal convolutional networks (TCNs) to robustly estimate camera-centric multi-person 3D poses that does not require camera parameters. In particula… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
14
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 20 publications
(14 citation statements)
references
References 35 publications
0
14
0
Order By: Relevance
“…As per our verification, most of the vision data could be converted into the format of embedding and compared through contrastive learning. Meanwhile, the research work [156]- [159] discussed how deep learning works well in skeleton-based behavior recognition through the common frameworks with GCN and CNN. The double stream of combined score results has been verified its advantages in multiple behavior recognition tasks with double joints stream in research [159], and the combination of multiple models in deep feature extractions [156].…”
Section: Deep Neural Network and Comparisonmentioning
confidence: 99%
See 1 more Smart Citation
“…As per our verification, most of the vision data could be converted into the format of embedding and compared through contrastive learning. Meanwhile, the research work [156]- [159] discussed how deep learning works well in skeleton-based behavior recognition through the common frameworks with GCN and CNN. The double stream of combined score results has been verified its advantages in multiple behavior recognition tasks with double joints stream in research [159], and the combination of multiple models in deep feature extractions [156].…”
Section: Deep Neural Network and Comparisonmentioning
confidence: 99%
“…Meanwhile, the research work [156]- [159] discussed how deep learning works well in skeleton-based behavior recognition through the common frameworks with GCN and CNN. The double stream of combined score results has been verified its advantages in multiple behavior recognition tasks with double joints stream in research [159], and the combination of multiple models in deep feature extractions [156]. The multi-modality human behavior recognition has also proven its availability in the increased identification accuracy.…”
Section: Deep Neural Network and Comparisonmentioning
confidence: 99%
“…As our verification, most of vision data could be converted into the format of embedding and compared through contrastive learning. Meanwhile, the research work [122]- [125] discussed how the deep learning works well in the skeleton-based behavior recognition through the common frameworks with GCN and CNN. The double stream of combined score results have been verified its advantages in multiple behavior recognition tasks with double joints stream in research [125], and multiple models of deep feature extractions [122].…”
Section: Deep Neural Network and Comparisonmentioning
confidence: 99%
“…Meanwhile, the research work [122]- [125] discussed how the deep learning works well in the skeleton-based behavior recognition through the common frameworks with GCN and CNN. The double stream of combined score results have been verified its advantages in multiple behavior recognition tasks with double joints stream in research [125], and multiple models of deep feature extractions [122]. The multi-modality human behavior recognition has also been proofed its availability in the increased identification accuracy.…”
Section: Deep Neural Network and Comparisonmentioning
confidence: 99%
“…For example, [7] proposed a deep regression network to estimate absolute 3D coordinates directly from images without intermediate 2D pose representations, and [8] improved the accuracy of poste estimations using a keypoint coordinate representation based on camera line-of-sight. Unlike these approaches, which estimate 3D poses from a single image, methods using multiple images of video frames have also been studied; for example, the authors of [9] used temporal convolutional networks (TCNs) to obtain absolute 3D human poses. While the above methods require a high-end computer with a GPU, the authors of [10] improved MobileNetV2 model [13] designed as a lightweight DNN to enable pose estimation on smartphones.…”
Section: Related Workmentioning
confidence: 99%