2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.01279
|View full text |Cite
|
Sign up to set email alerts
|

Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 66 publications
(27 citation statements)
references
References 24 publications
0
27
0
Order By: Relevance
“…Lately, there has been a trend to adopt the multi-head self-attention (MHA) module [199] for long-term sequence dependency modeling [109], [135]. Wan et al [148] modify the original MHA to perform spatial and temporal encoding simultaneously.…”
Section: Recovery From Monocular Videosmentioning
confidence: 99%
See 1 more Smart Citation
“…Lately, there has been a trend to adopt the multi-head self-attention (MHA) module [199] for long-term sequence dependency modeling [109], [135]. Wan et al [148] modify the original MHA to perform spatial and temporal encoding simultaneously.…”
Section: Recovery From Monocular Videosmentioning
confidence: 99%
“…Tripathi et al [200] use a sliding window to penalize 3D joints of the same frames before and after the window strides. Wan et al [148] use a series of learnable linear regressors to decode joint rotations in a hierarchical order. Some objective terms are predefined empirically or learned from large motion capture datasets [86], [92].…”
Section: Recovery From Monocular Videosmentioning
confidence: 99%
“…25,26 These network models contain huge amounts of parameters and so may be limited to be used for videos in some applications. Aiming at video processing, some recent works 27,28 exploit transformer modules to capture the temporal information from both the past frames and the future frames, proving the validity and efficiency for estimating human pose 27 and even shape. 28 Multi branches of network model have displayed superiorities to cope with domain-shift when transferring among different datasets.…”
Section: Transformer In Computer Visionmentioning
confidence: 99%
“…Generally speaking, multi-frame pose estimation approaches [9,26,29,37,46,65,66] show advantages over single-frame ones. Specifically, some works apply temporal models (e.g., GRUs [9,29,37], TCNs [46,65], and Transformers [59,69]) for feature extraction, ensuring the pose estimators have continuous inputs on time sequences. Other methods employ regularizers or loss functions for smoothness [26,41,43,54,57,67] to constrain the temporal consistency across successive frames.…”
Section: The Jitter Problem From Pose Estimatorsmentioning
confidence: 99%