Abstract:Object detection and pose estimation are strict requirements for many robotic grasping and manipulation applications to endow robots with the ability to grasp objects with different properties in cluttered scenes and with various lighting conditions. This work proposes the framework i2c-net to extract the 6D pose of multiple objects belonging to different categories, starting from an instance-level pose estimation network and relying only on RGB images. The network is trained on a custom-made synthetic photo-r… Show more
“…Indeed, the statistics shown by the BOP benchmark reported that the scores suffered from a huge drop even at low levels of occlusion, as demonstrated by the 30% gap of difference in performance obtained in LINEMODE and Occluded-LINEMODE that provides the same objects but partially occluded. Estimating the 6D pose of objects is an active field with important practical implications, and after 2018, other works [18], [19] have been published, showing a margin of improvement for several aspects. Therefore, the authors believe that in the near future, such methods can be employed for the proposed benchmark to automatically detect the occlusion percentage of cluttered scenes in the evaluation metric, but in the meanwhile, manual segmentation guarantees more accurate measurements.…”
Autonomous and reliable robotic grasping is a desirable functionality in robotic manipulation and is still an open problem. Standardized benchmarks are important tools for evaluating and comparing robotic grasping and manipulation systems among different research groups and also for sharing with the community the best practices to learn from errors. An ideal benchmarking protocol should encompass the different aspects underpinning grasp execution, including the mechatronic design of grippers, planning, perception, and control to give information on each aspect and the overall problem. This article gives an overview of the benchmarks, datasets, and competitions that have been proposed and adopted in the last few years and presents a novel benchmark with protocols for different tasks that evaluate both the single components of the system and the system as a whole, introducing an evaluation metric that allows for a fair comparison in highly cluttered scenes taking into account the difficulty of the clutter. A website dedicated to the benchmark containing information on the different tasks, maintaining the leaderboards, and serving as a contact point for the community is also provided.
“…Indeed, the statistics shown by the BOP benchmark reported that the scores suffered from a huge drop even at low levels of occlusion, as demonstrated by the 30% gap of difference in performance obtained in LINEMODE and Occluded-LINEMODE that provides the same objects but partially occluded. Estimating the 6D pose of objects is an active field with important practical implications, and after 2018, other works [18], [19] have been published, showing a margin of improvement for several aspects. Therefore, the authors believe that in the near future, such methods can be employed for the proposed benchmark to automatically detect the occlusion percentage of cluttered scenes in the evaluation metric, but in the meanwhile, manual segmentation guarantees more accurate measurements.…”
Autonomous and reliable robotic grasping is a desirable functionality in robotic manipulation and is still an open problem. Standardized benchmarks are important tools for evaluating and comparing robotic grasping and manipulation systems among different research groups and also for sharing with the community the best practices to learn from errors. An ideal benchmarking protocol should encompass the different aspects underpinning grasp execution, including the mechatronic design of grippers, planning, perception, and control to give information on each aspect and the overall problem. This article gives an overview of the benchmarks, datasets, and competitions that have been proposed and adopted in the last few years and presents a novel benchmark with protocols for different tasks that evaluate both the single components of the system and the system as a whole, introducing an evaluation metric that allows for a fair comparison in highly cluttered scenes taking into account the difficulty of the clutter. A website dedicated to the benchmark containing information on the different tasks, maintaining the leaderboards, and serving as a contact point for the community is also provided.
“…The boom of deep learning has significantly improved object pose estimation. A series of methods have been proposed to holistically estimate object poses from monocular color images [1], [26], [27], [28] or with the aid from depth sensors [29], [30], [31], [32], [33], [34]. These methods took advantage of the CNNs' regression ability to learn mapping functions from the observed images to object poses.…”
6-DoF object pose estimation from a monocular image is a challenging problem, where a post-refinement procedure is generally needed for high-precision estimation. In this paper, we propose a framework, dubbed RNNPose, based on a recurrent neural network (RNN) for object pose refinement, which is robust to erroneous initial poses and occlusions. During the recurrent iterations, object pose refinement is formulated as a non-linear least squares problem based on the estimated correspondence field (between a rendered image and the observed image). The problem is then solved by a differentiable Levenberg-Marquardt (LM) algorithm enabling end-to-end training. The correspondence field estimation and pose refinement are conducted alternately in each iteration to improve the object poses. Furthermore, to improve the robustness against occlusion, we introduce a consistency-check mechanism based on the learned descriptors of the 3D model and observed 2D images, which downweights the unreliable correspondences during pose optimization. We evaluate RNNPose on several public datasets, including LINEMOD, Occlusion-LINEMOD, YCB-Video and TLESS. We demonstrate state-of-the-art performance and strong robustness against severe clutter and occlusion in the scenes. Extensive experiments validate the effectiveness of our proposed method. Besides, the extended system based on RNNPose successfully generalizes to multi-instance scenarios and achieves top-tier performance on the TLESS dataset.
“…Therefore, an important goal in this ongoing industrial revolution is to make such algorithms robust to clutter to increase the flexibility of the next-generation of robots. Estimating the pose of objects is an active field with important practical implications, and in the last years, some works ( Song et al, 2020 ; Remus et al, 2023 ) have been published showing a margin of improvement for several aspects.…”
Several datasets have been proposed in the literature, focusing on object detection and pose estimation. The majority of them are interested in recognizing isolated objects or the pose of objects in well-organized scenarios. This work introduces a novel dataset that aims to stress vision algorithms in the difficult task of object detection and pose estimation in highly cluttered scenes concerning the specific case of bin picking for the Cluttered Environment Picking Benchmark (CEPB). The dataset provides about 1.5M virtually generated photo-realistic images (RGB + depth + normals + segmentation) of 50K annotated cluttered scenes mixing rigid, soft, and deformable objects of varying sizes used in existing robotic picking benchmarks together with their 3D models (40 objects). Such images include three different camera positions, three light conditions, and multiple High Dynamic Range Imaging (HDRI) maps for domain randomization purposes. The annotations contain the 2D and 3D bounding boxes of the involved objects, the centroids’ poses (translation + quaternion), and the visibility percentage of the objects’ surfaces. Nearly 10K separated object images are presented to perform simple tests and compare them with more complex cluttered scenarios tests. A baseline performed with the DOPE neural network is reported to highlight the challenges introduced by the novel dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.