Yaoxian Song scite author profile

Robotic arm grasping is a fundamental operation in robotic control task goals. Most current methods for robotic grasping focus on RGB-D policy in the table surface scenario or 3D point cloud analysis and inference in the 3D space. Comparing to these methods, we propose a novel real-time multimodal hierarchical encoder-decoder neural network that fuses RGB and depth data to realize robotic humanoid grasping in 3D space with only partial observation. The quantification of raw depth data's uncertainty and depth estimation fusing RGB is considered. We develop a general labeling method to label ground-truth on common RGB-D datasets. We evaluate the effectiveness and performance of our method on a physical robot setup and our method achieves over 90% success rate in both table surface and 3D space scenarios. The video is available in https://youtu.be/_iRyLcfbTfg.

show abstract

Multimodal Aggregation Approach for Memory Vision-Voice Indoor Navigation with Meta-Learning

Yan

Liu

Song

et al. 2020

View full text Add to dashboard Cite

UG-Net for Robotic Grasping using Only Depth Image

Song

Fei²,

Cheng

et al. 2019

View full text Add to dashboard Cite

Group Pressure Leads to Consensus of Hegselmann-Krause Opinion Dynamics

Cheng

Song

2019

View full text Add to dashboard Cite

Tactile–Visual Fusion Based Robotic Grasp Detection Method with a Reproducible Sensor

Song¹,

Luo²,

Yu³

2021

IJCIS

View full text Add to dashboard Cite

Robotic grasp detection is a fundamental problem in robotic manipulation. The conventional grasp methods, using vision information only, can cause potential damage in force-sensitive tasks. In this paper, we propose a tactile-visual based method using a reproducible sensor to realize a fine-grained and haptic grasping. Although there exist several tactile-based methods, they require expensive custom sensors in coordination with their specific datasets. In order to overcome the limitations, we introduce a low-cost and reproducible tactile fingertip and build a general tactile-visual fusion grasp dataset including 5,110 grasping trials. We further propose a hierarchical encoder-decoder neural network to predict grasp points and force in an end-to-end manner. Then comparisons of our method with the state-of-the-art methods in the benchmark are shown both in vision-based and tactile-visual fusion schemes, and our method outperforms in most scenarios. Furthermore, we also compare our fusion method with the only vision-based method in the physical experiment, and the results indicate that our end-to-end method empowers the robot with a more fine-grained grasp ability, reducing force redundancy by 41%. Our project is available at https://sites.google.com/view/tvgd

show abstract

An improved target tracking scheme via integrating mean-shift with TLD algorithm

Fan

Huang

et al. 2017

View full text Add to dashboard Cite

Multimodal Aggregation Approach for Memory Vision-Voice Indoor Navigation with Meta-Learning

Yan

Liu²,

Song

et al. 2020

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yaoxian Song

Face recognition based on convolution neural network

Deep Robotic Grasping Prediction with Hierarchical RGB-D Fusion

Multimodal Aggregation Approach for Memory Vision-Voice Indoor Navigation with Meta-Learning

UG-Net for Robotic Grasping using Only Depth Image

Group Pressure Leads to Consensus of Hegselmann-Krause Opinion Dynamics

Tactile–Visual Fusion Based Robotic Grasp Detection Method with a Reproducible Sensor

An improved target tracking scheme via integrating mean-shift with TLD algorithm

Multimodal Aggregation Approach for Memory Vision-Voice Indoor Navigation with Meta-Learning

Contact Info

Product

Resources

About