2021
DOI: 10.48550/arxiv.2111.10677
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

VideoPose: Estimating 6D object pose from videos

Abstract: We introduce a simple yet effective algorithm that uses convolutional neural networks to directly estimate object poses from videos. Our approach leverages the temporal information from a video sequence, and is computationally efficient and robust to support robotic and AR domains. Our proposed network takes a pre-trained 2D object detector as input, and aggregates visual features through a recurrent neural network to make predictions at each frame. Experimental evaluation on the YCB-Video dataset show that ou… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 29 publications
0
1
0
Order By: Relevance
“…Furthermore, The VideoPose approach, an open-source video-based three-dimensional pose estimation method, simplifies the complexities of learning from two-dimensional data. This method enables the acquisition of three-dimensional key body positions without requiring intricate feature learning processes, thanks to its foundation on time-dilated convolutional networks, which exhibit strong performance in dynamic environments ( 30–32 ). The integration of these methods empowers the extraction and comprehensive analysis of spatiotemporal motion insights derived from camera-generated data.…”
Section: Introductionmentioning
confidence: 99%
“…Furthermore, The VideoPose approach, an open-source video-based three-dimensional pose estimation method, simplifies the complexities of learning from two-dimensional data. This method enables the acquisition of three-dimensional key body positions without requiring intricate feature learning processes, thanks to its foundation on time-dilated convolutional networks, which exhibit strong performance in dynamic environments ( 30–32 ). The integration of these methods empowers the extraction and comprehensive analysis of spatiotemporal motion insights derived from camera-generated data.…”
Section: Introductionmentioning
confidence: 99%