2021
DOI: 10.1109/tce.2021.3057137
|View full text |Cite
|
Sign up to set email alerts
|

End-to-End 6DoF Pose Estimation From Monocular RGB Images

Abstract: We present a conceptually simple framework for 6DoF object pose estimation, especially for autonomous driving scenarios. Our approach can efficiently detect the traffic participants from a monocular RGB image while simultaneously regressing their 3D translation and rotation vectors. The proposed method 6D-VNet, extends the Mask R-CNN by adding customised heads for predicting vehicle's finer class, rotation and translation. It is trained end-to-end compared to previous methods. Furthermore, we show that the inc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 18 publications
(9 citation statements)
references
References 45 publications
0
6
0
Order By: Relevance
“…In particular, quaternion is a popular representation. It is utilized such as in PoseCNN [1], PoseNet [10], and 6D-VNet [11]. Another 4-dimensional representation is an axis-rotation representation, introduced such as in MapNet [12].…”
Section: A Continuous Rotation Representationsmentioning
confidence: 99%
“…In particular, quaternion is a popular representation. It is utilized such as in PoseCNN [1], PoseNet [10], and 6D-VNet [11]. Another 4-dimensional representation is an axis-rotation representation, introduced such as in MapNet [12].…”
Section: A Continuous Rotation Representationsmentioning
confidence: 99%
“…The single end-to-end network was also used to tackle 6DoF pose estimation from single monocular RGB images [29,30]. Bayesian PoseNet was presented to achieve end-to-end train, and the uncertainty of repositioning was obtained through the posterior distribution from Bayesian CNN weights.…”
Section: Related Workmentioning
confidence: 99%
“…Bayesian PoseNet was presented to achieve end‐to‐end train, and the uncertainty of repositioning was obtained through the posterior distribution from Bayesian CNN weights. Both PoseNet [27] and Bayesian PoseNet models [30] showed great performance in indoor and outdoor relocation environment. The work in [27] was extended to address the difficulty in setting the super parameters of the loss function in the PoseNet by presenting a more basic theoretical treatment for, and explored many new loss functions for learning the camera attitude based on geometry and scene re‐projection errors.…”
Section: Introductionmentioning
confidence: 99%
“…Deep learning technology has achieved considerably good performance in computer vision, natural language processing, speech recognition, and other fields. In recent years, ModelNet (Wu et al, 2015 ), ShapeNet (Yang et al, 2021 ), ScanNet (Zou et al, 2021 ), and other publicly available datasets have also driven research in 3D model classification based on deep learning. 3D model classification methods based on deep learning can be divided into three categories based on the representation of the input data: voxel-based, point cloud-based, and multi-view-based.…”
Section: Introductionmentioning
confidence: 99%