2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00776
|View full text |Cite
|
Sign up to set email alerts
|

Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation

Abstract: Estimating the 6D pose of objects using only RGB images remains challenging because of problems such as occlusion and symmetries. It is also difficult to construct 3D models with precise texture without expert knowledge or specialized scanning devices. To address these problems, we propose a novel pose estimation method, Pix2Pose, that predicts the 3D coordinates of each object pixel without textured models. An auto-encoder architecture is designed to estimate the 3D coordinates and expected errors per pixel. … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
332
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
4

Relationship

2
6

Authors

Journals

citations
Cited by 426 publications
(332 citation statements)
references
References 30 publications
0
332
0
Order By: Relevance
“…Traditional methods identify hand-crafted features to localize an object model within a scene (Klank et al, 2009;Srinivasa et al, 2010;Chitta et al, 2012a) but more recently advances for pose estimation have been made by the application of deep learning (Xiang et al, 2018;Park et al, 2019b;Zakharov et al, 2019) and grasping pipelines achieve high success rate (Tremblay et al, 2018;Wang C. et al, 2019). The main limitation of this direction of research, however, is the closedworld assumption.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Traditional methods identify hand-crafted features to localize an object model within a scene (Klank et al, 2009;Srinivasa et al, 2010;Chitta et al, 2012a) but more recently advances for pose estimation have been made by the application of deep learning (Xiang et al, 2018;Park et al, 2019b;Zakharov et al, 2019) and grasping pipelines achieve high success rate (Tremblay et al, 2018;Wang C. et al, 2019). The main limitation of this direction of research, however, is the closedworld assumption.…”
Section: Related Workmentioning
confidence: 99%
“…For the reconstruction of VD-NOC values, the standard L1 loss is applied for each pixel p. Since background pixels are masked out, their values are easy to predict. Hence, the loss values for pixels on the object masks M i ∈ R W×H are weighted by a factor of 3 to more precisely predict the values of pixels in the object masks (Park et al, 2019b). The reconstruction loss is thus given by,…”
Section: Training Objectivementioning
confidence: 99%
“…Euclidean distance, scalable nearest neighbor search method, and CNN are integrated as an efficient model to capture both the object identity and 3D pose [17,18]. Park et al proposed a novel architecture Pix2Pose based on CNN [19], which predicted the coordinates of each pixel after feature extraction, and then calculated the position and orientation by voting. Although this effort largely improved the robustness of pose estimation, especially under heavy occlusion, the computation cost was relatively expensive considering a comparable accuracy can be achieved with other methods.…”
Section: Related Workmentioning
confidence: 99%
“…3. In this novel method Pix2Pose [16] we predict these coloured images to build a 2D-3D correspondence per pixel directly without any feature matching operation.…”
Section: Object Detection and Pose Estimationmentioning
confidence: 99%
“…In these cases the object pose will not be correct or wrongly attached to a symmetric view resulting in divergence of network training. The ideas presented in Pix2Pose tackle these problems [16]. For occlusion, the pixel-wise prediction is performed for not only the visible area but also the occluded region.…”
Section: Object Detection and Pose Estimationmentioning
confidence: 99%