2023
DOI: 10.1049/cvi2.12172
|View full text |Cite
|
Sign up to set email alerts
|

Video2mesh: 3D human pose and shape recovery by a temporal convolutional transformer network

Abstract: From a 2D video of a person in action, human mesh recovery aims to infer the 3D human pose and shape frame by frame. Despite progress on video-based human pose and shape estimation, it is still challenging to guarantee high accuracy and smoothness simultaneously. To tackle this problem, we propose a Video2mesh, a temporal convolutional transformer (TConvTransformer) based temporal network which is able to recover accurate and smooth human mesh from 2D video. The temporal convolution block achieves the sequence… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 40 publications
0
2
0
Order By: Relevance
“…The poses are shifted to a common origin point and are rescaled to the same size which makes our model independent of the subject size and position in the frame. The temporal information is exploited using the LSTM network [17]. Though FCNs suffer from vanishing gradient problem, this problem can be eliminated by using residual connections in the network.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The poses are shifted to a common origin point and are rescaled to the same size which makes our model independent of the subject size and position in the frame. The temporal information is exploited using the LSTM network [17]. Though FCNs suffer from vanishing gradient problem, this problem can be eliminated by using residual connections in the network.…”
Section: Methodsmentioning
confidence: 99%
“…3. Readout: Following the message passing step, the nal node representations nd application in downstream tasks, such as node classi cation, link prediction, or graph-level prediction [17].…”
Section: Preliminariesmentioning
confidence: 99%