2019 International Conference on Robotics and Automation (ICRA) 2019
DOI: 10.1109/icra.2019.8793744
|View full text |Cite
|
Sign up to set email alerts
|

Segmenting Unknown 3D Objects from Real Depth Images using Mask R-CNN Trained on Synthetic Data

Abstract: The ability to segment unknown objects in depth images has potential to enhance robot skills in grasping and object tracking. Recent computer vision research has demonstrated that Mask R-CNN can be trained to segment specific categories of objects in RGB images when massive handlabeled datasets are available. As generating these datasets is time-consuming, we instead train with synthetic depth images. Many robots now use depth sensors, and recent results suggest training on synthetic depth data can transfer su… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
108
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 177 publications
(120 citation statements)
references
References 48 publications
(84 reference statements)
1
108
0
1
Order By: Relevance
“…This technique has been successfully used for object localization [48••], segmentation [74], robot control for pick-andplace [75], swing-peg-in-hole [76], opening a cabinet drawer [76], in-hand manipulation [77], one-handed Rubik's Cube solving [78], precise 6D pose regression in highly cluttered environments [20•], etc. Modifications propose an automatic scheduling of the intensity of the randomization based on the current performance of the system [78] or adapting simulation randomizations by using real-world data to identify distributions that are particularly suited for a successful transfer [76].…”
Section: Domain Randomizationmentioning
confidence: 99%
“…This technique has been successfully used for object localization [48••], segmentation [74], robot control for pick-andplace [75], swing-peg-in-hole [76], opening a cabinet drawer [76], in-hand manipulation [77], one-handed Rubik's Cube solving [78], precise 6D pose regression in highly cluttered environments [20•], etc. Modifications propose an automatic scheduling of the intensity of the randomization based on the current performance of the system [78] or adapting simulation randomizations by using real-world data to identify distributions that are particularly suited for a successful transfer [76].…”
Section: Domain Randomizationmentioning
confidence: 99%
“…For our work, we chose an open source implementation by Matterport [34] in Python with the use of Keras and Tensorflow frameworks. The model and its implementation has already gained popularity among various researchers [35][36][37][38][39][40] for of several reasons-The model is published under MIT license, which allows users to modify the model; it adopts well established CNN backbone ResNet [41] and recently introduced concepts like Feature Pyramid Network (FPN) [42] and ROI Align that in terms of quality make Mask R-CNN superior to comparable models like Faster R-CNN; the maximum accepted input image resolution of the model (1024 × 1024 pixels) is high when compared to many previously developed CNN models like YOLO (up to 608 × 608 pixels) [43] or Faster R-CNN [42] Python implementation (600 × 1000 pixels). The ability to analyze higher resolution images is important especially when dealing with small objects like biological cells [35].…”
Section: Cnn Quantification Methodsmentioning
confidence: 99%
“…In case the objects do not have colour or textural information, this approach fails to produce reliable results. More recently, people used different Artificial Neural Networks (ANN) for 3d object recognition and 6d pose estimation [ 18 , 19 , 28 , 29 ]. Gupta et al [ 18 ] used both colour images and depth features to train a Convolutional Neural Network (CNN) model.…”
Section: Related Workmentioning
confidence: 99%
“…To improve the time performance of shape retrieval approaches, researchers [ 16 ] suggested to move the heavy computation processes to the offline stages. With the affordable price and accessibility of the RGB-D sensors, researchers proposed different object detection and pose estimation methods using both optical and depth information [ 10 , 17 , 18 , 19 , 20 ]. Although these methods usually outperform the approaches based on optical information only, depth sensors have a limited capturing angle and are more sensitive to illumination conditions.…”
Section: Introductionmentioning
confidence: 99%