DPOD: 6D Pose Object Detector and Refiner

Zakharov, Sergey; Shugurov, Ivan; Ilić, Slobodan

doi:10.1109/iccv.2019.00203

Cited by 453 publications

(434 citation statements)

References 34 publications

(95 reference statements)

Supporting

Mentioning

415

Contrasting

Order By: Relevance

“…Unlike the previous categories of methods, i.e., classification-based and regressionbased, this category performs the classification and regression tasks within a single architecture. The methods can firstly do the classification, the outcomes of which are cured in a regression-based refinement step [105], [84], [78], [166] or vice versa [75], or can do the classification and regression in a single-shot process [87], [145], [101], [106], [100], [148], [103], [102], [30], [37], [162].…”

Section: B Regressionmentioning

confidence: 99%

“…DeepContext [78] is trained on the partially synthetic training depth images which exhibit a variety of different local object appearances, and real data are used to fine tune the method. The 2D-driven 3D methods and the 3D BB detectors work at the level of categories, and the 6D methods [30], [37], [166], [162] work at instance-level.…”

Section: Classification and Regressionmentioning

confidence: 99%

See 1 more Smart Citation

A review on object pose recovery: From 3D bounding box detectors to full 6D pose estimators

Şahin

Garcia-Hernando

Sock

et al. 2020

Image and Vision Computing

View full text Add to dashboard Cite

Object pose recovery has gained increasing attention in the computer vision field as it has become an important problem in rapidly evolving technological areas related to autonomous driving, robotics, and augmented reality. Existing review-related studies have addressed the problem at visual level in 2D, going through the methods which produce 2D bounding boxes of objects of interest in RGB images. The 2D search space is enlarged either using the geometry information available in the 3D space along with RGB (Mono/Stereo) images, or utilizing depth data from LIDAR sensors and/or RGB-D cameras. 3D bounding box detectors, producing category-level amodal 3D bounding boxes, are evaluated on gravity aligned images, while full 6D object pose estimators are mostly tested at instance-level on the images where the alignment constraint is removed. Recently, 6D object pose estimation is tackled at the level of categories. In this paper, we present the first comprehensive and most recent review of the methods on object pose recovery, from 3D bounding box detectors to full 6D pose estimators. The methods mathematically model the problem as a classification, regression, classification & regression, template matching, and point-pair feature matching task. Based on this, a mathematical-model-based categorization of the methods is established. Datasets used for evaluating the methods are investigated with respect to the challenges, and evaluation metrics are studied. Quantitative results of experiments in the literature are analysed to show which category of methods best performs across what types of challenges. The analyses are further extended comparing two methods, which are our own implementations, so that the outcomes from the public results are further solidified. Current position of the field is summarized regarding object pose recovery, and possible research directions are identified.

show abstract

Section: B Regressionmentioning

confidence: 99%

Section: Classification and Regressionmentioning

confidence: 99%

A review on object pose recovery: From 3D bounding box detectors to full 6D pose estimators

Şahin

Garcia-Hernando

Sock

et al. 2020

Image and Vision Computing

View full text Add to dashboard Cite

show abstract

“…Another feature not well covered by synthetic data is proper illumination. Recent methods [21,29,16,56] prerender a number of synthetic images featuring different light conditions. Here, we instead implement differentiable lighting based on the simple Phong model [35], which is fully operated by the network.…”

Section: Light Module (L)mentioning

confidence: 99%

DeceptionNet: Network-Driven Domain Randomization

Zakharov

Kehl²,

Ilić

2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

Self Cite

View full text Add to dashboard Cite

We present a novel approach to tackle domain adaptation between synthetic and real data. Instead, of employing "blind" domain randomization, i.e., augmenting synthetic renderings with random backgrounds or changing illumination and colorization, we leverage the task network as its own adversarial guide toward useful augmentations that maximize the uncertainty of the output. To this end, we design a min-max optimization scheme where a given task competes against a special deception network to minimize the task error subject to the specific constraints enforced by the deceiver. The deception network samples from a family of differentiable pixel-level perturbations and exploits the task architecture to find the most destructive augmentations. Unlike GAN-based approaches that require unlabeled data from the target domain, our method achieves robust mappings that scale well to multiple target distributions from source data alone. We apply our framework to the tasks of digit recognition on enhanced MNIST variants, classification and object pose estimation on the Cropped LineMOD dataset as well as semantic segmentation on the Cityscapes dataset and compare it to a number of domain adaptation approaches, thereby demonstrating similar results with superior generalization capabilities.

show abstract

“…When it comes to deep learning methods, training detectors on real data yields the best results. However, the fact that 3D models of the objects are available and training data can be synthesized by rendering them has been used in only a few studies, most notably using such detectors as SSD6D [20], AAE [29] and DPOD [34]. It is remarkable that all deep learning 6DoF object detectors trained either on real or synthetic data use a single neural network per object, in contrast to 2D object detectors, such as YOLO [24], SSD [21] or R-CNNs [14,13,25,15], which use one network for all object classes.…”

Section: Introductionmentioning

confidence: 99%

“…In the following sections, besides describing the steps of the dataset creation pipeline, we also present detection and 6D pose estimation results for all the newly-defined benchmarks of one of a recently introduced methods -Dense Pose Object Detector (DPOD) [34]. This method is trained on all the objects at once or on all the objects present in the test scene on strictly synthetic renderings of provided 3D models.…”

Section: Introductionmentioning

confidence: 99%

HomebrewedDB: RGB-D Dataset for 6D Pose Estimation of 3D Objects

Kaskman

Zakharov

Shugurov

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)

Self Cite

112

View full text Add to dashboard Cite

Among the most important prerequisites for creating and evaluating 6D object pose detectors are datasets with labeled 6D poses. With the advent of deep learning, demand for such datasets is growing continuously. Despite the fact that some of exist, they are scarce and typically have restricted setups, such as a single object per sequence, or they focus on specific object types, such as textureless industrial parts. Besides, two significant components are often ignored: training using only available 3D models instead of real data and scalability, i.e. training one method to detect all objects rather than training one detector per object. Other challenges, such as occlusions, changing light conditions and changes in object appearance, as well precisely defined benchmarks are either not present or are scattered among different datasets.In this paper we present a dataset for 6D pose estimation that covers the above-mentioned challenges, mainly targeting training from 3D models (both textured and textureless), scalability, occlusions, and changes in light conditions and object appearance. The dataset features 33 objects (17 toy, 8 household and 8 industry-relevant objects) over 13 scenes of various difficulty. We also present a set of benchmarks to test various desired detector properties, particularly focusing on scalability with respect to the number of objects and resistance to changing light conditions, occlusions and clutter. We also set a baseline for the presented benchmarks using a state-of-the-art DPOD detector. Considering the difficulty of making such datasets, we plan to release the code allowing other researchers to extend this dataset or make their own datasets in the future.

show abstract

DPOD: 6D Pose Object Detector and Refiner

Cited by 453 publications

References 34 publications

A review on object pose recovery: From 3D bounding box detectors to full 6D pose estimators

A review on object pose recovery: From 3D bounding box detectors to full 6D pose estimators

DeceptionNet: Network-Driven Domain Randomization

HomebrewedDB: RGB-D Dataset for 6D Pose Estimation of 3D Objects

Contact Info

Product

Resources

About