Abstract:In robotic bin-picking applications, the perception of texture-less, highly reflective parts is a valuable but challenging task. The high glossiness can introduce fake edges in RGB images and inaccurate depth measurements especially in heavily cluttered bin scenario. In this paper, we present the ROBI (Reflective Objects in BIns) dataset, a public dataset for 6D object pose estimation and multi-view depth fusion in robotic bin-picking scenarios. The ROBI dataset includes a total of 63 bin-picking scenes captur… Show more
“…The proposed method LCRN was pretrained on synthetic stereo dataset SceneFlow 14 and finetune it on ROBI 18 dataset. Since the number of stereo pairs are crucial for network training, pre-training on a large synthetic dataset is necessary.…”
Deep learning methods have been widely used to complete the task of stereo matching in recent years, which is the key step in machine vision measurement. State-of-the-art methods are three-dimensional (3D) end-to-end networks that forms a cost volume by concatenating extracted features and processes it with 3D modules. Despite the strong performance in terms of accuracy, 3D networks mostly have high computational cost, heavy memory storge and long run-time. In this paper proposed Local Cost Volume Refinement Network (LCRN), which is a two-dimensional (2D) end-to-end network composed of feature extraction, disparity initialization, disparity refinement and disparity mergence module. LCRN initializes disparity maps by using correlation layer and residual blocks, and refines them by using local cost volumes, residual blocks and disparity regression. Local cost volumes are constructed by warping right features and giving a small disparity shift. To verify the effectiveness of LCRN, the network was pre-trained on SceneFlow dataset and fine-tuned on ROBI dataset. The network is evaluated on the test set of ROBI for robotic bin-picking. Experimental results show that LCRN maintains a competitive accuracy while having fast run-time and requiring less memory storage.
“…The proposed method LCRN was pretrained on synthetic stereo dataset SceneFlow 14 and finetune it on ROBI 18 dataset. Since the number of stereo pairs are crucial for network training, pre-training on a large synthetic dataset is necessary.…”
Deep learning methods have been widely used to complete the task of stereo matching in recent years, which is the key step in machine vision measurement. State-of-the-art methods are three-dimensional (3D) end-to-end networks that forms a cost volume by concatenating extracted features and processes it with 3D modules. Despite the strong performance in terms of accuracy, 3D networks mostly have high computational cost, heavy memory storge and long run-time. In this paper proposed Local Cost Volume Refinement Network (LCRN), which is a two-dimensional (2D) end-to-end network composed of feature extraction, disparity initialization, disparity refinement and disparity mergence module. LCRN initializes disparity maps by using correlation layer and residual blocks, and refines them by using local cost volumes, residual blocks and disparity regression. Local cost volumes are constructed by warping right features and giving a small disparity shift. To verify the effectiveness of LCRN, the network was pre-trained on SceneFlow dataset and fine-tuned on ROBI dataset. The network is evaluated on the test set of ROBI for robotic bin-picking. Experimental results show that LCRN maintains a competitive accuracy while having fast run-time and requiring less memory storage.
“…In our experiments, we use an industrial-grade SLI camera (IDS ENSNESO N35), which equips with two cameras and a visible-light projector. We evaluate our method on the ROBI dataset [8], which was captured using this camera. The ROBI dataset provides multi-view depth maps and pattern-projected images for shiny objects.…”
Section: A Datasets and Evaluation Metricsmentioning
confidence: 99%
“…We evaluate our framework on the challenging ROBI dataset [8]. We first evaluate our pose refinement with passive viewpoint selection, showing that our refinement module outperforms the widely used iterative closest point (ICP) approach when given the same input depth measurements.…”
6D pose estimation of textureless shiny objects has become an essential problem in many robotic applications. Many pose estimators require high-quality depth data, often measured by structured light cameras. However, when objects have shiny surfaces (e.g., metal parts), these cameras fail to sense complete depths from a single viewpoint due to the specular reflection, resulting in a significant drop in the final pose accuracy. To mitigate this issue, we present a complete active vision framework for 6D object pose refinement and next-bestview prediction. Specifically, we first develop an optimizationbased pose refinement module for the structured light camera. Our system then selects the next best camera viewpoint to collect depth measurements by minimizing the predicted uncertainty of the object pose. Compared to previous approaches, we additionally predict measurement uncertainties of future viewpoints by online rendering, which significantly improves the next-best-view prediction performance. We test our approach on the challenging real-world ROBI dataset. The results demonstrate that our pose refinement method outperforms the traditional ICP-based approach when given the same input depth data, and our next-best-view strategy can achieve high object pose accuracy with significantly fewer viewpoints than the heuristic-based policies.
“…Manufacturing use cases present unique challenges. Many industrial objects are reflective and textureless, with scratches or saw patterns affecting their appearance [32,4]. Parts are often stacked in dense compositions, with many occlusions.…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.