RGB-D-Based Object Recognition Using Multimodal Convolutional Neural Networks: A Survey

Gao, Mingliang; Jiang, Jun; Zou, Guofeng; John, Vijay; Liu, Zheng

doi:10.1109/access.2019.2907071

Cited by 48 publications

(25 citation statements)

References 183 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To perform image classification directly, the authors of [217,218] suggested the possibility of using multi-stream CNNs (i.e., two or more stream CNNs) to extract robust features from a final hidden layer and then project them onto a common representation space. However, the most commonly adopted approaches involve concatenating a set of pre-trained features derived from the huge ImageNet dataset to generate a multimodal representation [216].…”

Section: Convolutional Neural Network Basedmentioning

confidence: 99%

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

et al. 2021

View full text Add to dashboard Cite

The research progress in multimodal learning has grown rapidly over the last decade in several areas, especially in computer vision. The growing potential of multimodal data streams and deep learning algorithms has contributed to the increasing universality of deep multimodal learning. This involves the development of models capable of processing and analyzing the multimodal information uniformly. Unstructured real-world data can inherently take many forms, also known as modalities, often including visual and textual content. Extracting relevant patterns from this kind of data is still a motivating goal for researchers in deep learning. In this paper, we seek to improve the understanding of key concepts and algorithms of deep multimodal learning for the computer vision community by exploring how to generate deep models that consider the integration and combination of heterogeneous visual cues across sensory modalities. In particular, we summarize six perspectives from the current literature on deep multimodal learning, namely: multimodal data representation, multimodal fusion (i.e., both traditional and deep learning-based schemes), multitask learning, multimodal alignment, multimodal transfer learning, and zero-shot learning. We also survey current multimodal applications and present a collection of benchmark datasets for solving problems in various vision domains. Finally, we highlight the limitations and challenges of deep multimodal learning and provide insights and directions for future research.

show abstract

Section: Convolutional Neural Network Basedmentioning

confidence: 99%

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Negin et al 28 used the CNN model to make a simple exploration of static gesture recognition. Inspired by Wang's work, Gao et al 29 proposed a recognition multimodal feature learning method for RGB‐D scene recognition.…”

Section: Related Workmentioning

confidence: 99%

Gesture recognition based on multi‐modal feature weight

Duan

Sun

Cheng

et al. 2020

Concurrency and Computation

View full text Add to dashboard Cite

Summary With the continuous development of sensor technology, the acquisition cost of RGB‐D images is getting lower and lower, and gesture recognition based on depth images and Red‐Green‐Blue (RGB) images has gradually become a research direction in the field of pattern recognition. However, most of the current processing methods for RGB‐D gesture images are relatively simple, ignoring the relationship and influence between its two modes, and unable to make full use of the correlation factors between different modes. In view of the above problems, this paper optimizes the effect of RGB‐D information processing by considering the independent features and related features of multi‐modal data to construct a weight adaptive algorithm to fuse different features. Simulation experiments show that the method proposed in this paper is better than the traditional RGB‐D gesture image processing method and the gesture recognition rate is higher. Comparing the current more advanced gesture recognition methods, the method proposed in this paper also achieves higher recognition accuracy, which verifies the feasibility and robustness of this method.

show abstract

“…In contrast, the 3D camera not only obtains the plane feature information of the object but also acquires the spatial position information. As a result, the unknown object recognition by 3D cameras has become a research hotspot in recent years [ 106 ]. Compared with passive sensors, such as cameras, lidars, etc., the tactile sensor inspired by human touch acquires object feature information by active contact perception.…”

Section: Unknown Objectsmentioning

confidence: 99%

Feature Sensing and Robotic Grasping of Objects with Uncertain Information: A Review

Wang

Zhang

Zang

et al. 2020

Sensors

View full text Add to dashboard Cite

As there come to be more applications of intelligent robots, their task object is becoming more varied. However, it is still a challenge for a robot to handle unfamiliar objects. We review the recent work on the feature sensing and robotic grasping of objects with uncertain information. In particular, we focus on how the robot perceives the features of an object, so as to reduce the uncertainty of objects, and how the robot completes object grasping through the learning-based approach when the traditional approach fails. The uncertain information is classified into geometric information and physical information. Based on the type of uncertain information, the object is further classified into three categories, which are geometric-uncertain objects, physical-uncertain objects, and unknown objects. Furthermore, the approaches to the feature sensing and robotic grasping of these objects are presented based on the varied characteristics of each type of object. Finally, we summarize the reviewed approaches for uncertain objects and provide some interesting issues to be more investigated in the future. It is found that the object’s features, such as material and compactness, are difficult to be sensed, and the object grasping approach based on learning networks plays a more important role when the unknown degree of the task object increases.

show abstract

RGB-D-Based Object Recognition Using Multimodal Convolutional Neural Networks: A Survey

Cited by 48 publications

References 183 publications

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Gesture recognition based on multi‐modal feature weight

Feature Sensing and Robotic Grasping of Objects with Uncertain Information: A Review

Contact Info

Product

Resources

About