Visual Manipulation Relationship Network for Autonomous Robotics

Zhang, Hanbo; Lan, Xuguang; Zhou, Xin‐Wen; Tian, Zhiqiang; Zhang, Yang; Zheng, Nanning

doi:10.1109/humanoids.2018.8625071

Cited by 60 publications

(73 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We can see that our model improves the performance of visual manipulation relationship reasoning compared with our previous work [6]. We assume that the improvements are achieved following these changes: (1) Different from our previous work, in this paper, we use ResNet-101 as the feature extractor, or called "backbone" instead of ResNet-50 and VGG-16 in [6]; (2) the object detector is Faster-RCNN from [18] instead of SSD in [19]; (3) the backbone is updated using multi-task loss function including grasp detection loss.…”

Section: Metricsmentioning

confidence: 58%

“…Recent works prove that CNNs achieve advanced performance on visual relationship reasoning [14]- [16]. Different from visual relationship, visual manipulation relationship [6] is proposed to solve the problem of grasping order in object stacking scenes with consideration of the safety and stability of objects. However, when this algorithm is directly combined with the grasp detection network to solve grasping problem in object stacking scenes, there are two main difficulties: 1) it is difficult to correctly match the detected grasps and the detected objects in object stacking scenes; 2) the cascade structure causes a lot of redundant calculations (e.g.…”

Section: B Visual Manipulation Relationship Reasoningmentioning

confidence: 99%

“…Similarly, reasoning metrics are used to test the performance of the reasoning output on the test set of VMRD. This part will take our previous work [6] as the baseline. Grasping metrics are used to evaluate how well the proposed network performs in real-world experiments.…”

Section: Metricsmentioning

confidence: 99%

“…Though there are some previous works that try to complete grasping in dense clutter [2]- [5], as we know, our proposed algorithm is the first to combine perception, reasoning, and grasp planning simultaneously by using one neural network, and attempts to realize autonomous robotic grasp in complex scenarios. To evaluate our proposed algorithm, we validate the performance of our model in VMRD dataset [6]. For robotic experiments, Baxter robot is used as the executor to complete grasping tasks, in which the robot is required to find the target, make the plan for grasping and grasp the target step by step.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

A Multi-task Convolutional Neural Network for Autonomous Robotic Grasping in Object Stacking Scenes

Zhang¹,

Lan²,

Bai³

et al. 2019

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Self Cite

View full text Add to dashboard Cite

Autonomous robotic grasping plays an important role in intelligent robotics. However, how to help the robot grasp specific objects in object stacking scenes is still an open problem, because there are two main challenges for autonomous robots: (1)it is a comprehensive task to know what and how to grasp; (2)it is hard to deal with the situations in which the target is hidden or covered by other objects. In this paper, we propose a multi-task convolutional neural network for autonomous robotic grasping, which can help the robot find the target, make the plan for grasping and finally grasp the target step by step in object stacking scenes. We integrate vision-based robotic grasping detection and visual manipulation relationship reasoning in one single deep network and build the autonomous robotic grasping system. Experimental results demonstrate that with our model, Baxter robot can autonomously grasp the target with a success rate of 90.6%, 71.9% and 59.4% in object cluttered scenes, familiar stacking scenes and complex stacking scenes respectively.

show abstract

Section: Metricsmentioning

confidence: 58%

Section: B Visual Manipulation Relationship Reasoningmentioning

confidence: 99%

Section: Metricsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Multi-task Convolutional Neural Network for Autonomous Robotic Grasping in Object Stacking Scenes

Zhang¹,

Lan²,

Bai³

et al. 2019

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Self Cite

View full text Add to dashboard Cite

show abstract

“…On the Cornell dataset, the accuracy of our scheme in image splitting and object splitting is 97.12% and 95.89%, respectively, which is equivalent to the most advanced grasping detection algorithm. In order to verify the effect of the scheme in multi-object scenarios, we used the VMRD dataset [ 14 ] containing multi-object scenarios, which produced an accuracy of 74.3%. As shown in Figure 1 , our method is also applied to real robot grasping tasks.…”

Section: Introductionmentioning

confidence: 99%

Keypoint-Based Robotic Grasp Detection Scheme in Multi-Object Scenes

Wang

et al. 2021

Sensors

View full text Add to dashboard Cite

Robot grasping is an important direction in intelligent robots. However, how to help robots grasp specific objects in multi-object scenes is still a challenging problem. In recent years, due to the powerful feature extraction capabilities of convolutional neural networks (CNN), various algorithms based on convolutional neural networks have been proposed to solve the problem of grasp detection. Different from anchor-based grasp detection algorithms, in this paper, we propose a keypoint-based scheme to solve this problem. We model an object or a grasp as a single point—the center point of its bounding box. The detector uses keypoint estimation to find the center point and regress to all other object attributes such as size, direction, etc. Experimental results demonstrate that the accuracy of this method is 74.3% in the multi-object grasp dataset VMRD, and the performance on the single-object scene Cornell dataset is competitive with the current state-of-the-art grasp detection algorithm. Robot experiments demonstrate that this method can help robots grasp the target in single-object and multi-object scenes with overall success rates of 94% and 87%, respectively.

show abstract

Jensen–Shannon Divergence You Only Look Once: A Real‐Time Robotic Grasp Detection Network

Han,

2024

Advanced Intelligent Systems

View full text Add to dashboard Cite

In this article, the arbitrary‐oriented object detection problem with application in robotic grasping is addressed. A novel Jensen–Shannon divergence (JSD)– You Only Look Once (YOLO) model is proposed, which enables real‐time grasp detection with high performance. The one‐stage object detection network YOLOv5 is modified with a decoupled head, which solves the angle classification problem and rectangle parameter regression problem separately, such that the YOLOv5 network is applicable for robotic grasping and the detection accuracy is significantly improved. A circular smooth label angle classification method is proposed to tackle the boundary discontinuity problem in angle regression, and the periodicity of the angle prediction is guaranteed. A novel Jensen–Shannon intersection of union is designed to calculate the intersection over union of oriented rectangles, which aims to better measure the discrepancies between the prediction and the ground truth and to avoid the singularity problem when two rectangles are not overlapped. Extensive evaluation on the Cornell and visual manipulation relationship dataset datasets demonstrates the effectiveness of the JSD–YOLO model in general robotic grasp operations, with 99.7% and 95.7% image‐wise split accuracy, respectively.

show abstract

Visual Manipulation Relationship Network for Autonomous Robotics

Cited by 60 publications

References 37 publications

A Multi-task Convolutional Neural Network for Autonomous Robotic Grasping in Object Stacking Scenes

A Multi-task Convolutional Neural Network for Autonomous Robotic Grasping in Object Stacking Scenes

Keypoint-Based Robotic Grasp Detection Scheme in Multi-Object Scenes

Jensen–Shannon Divergence You Only Look Once: A Real‐Time Robotic Grasp Detection Network

Contact Info

Product

Resources

About