RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features

Schwarz, Max; Schulz, Hannes; Behnke, Sven

doi:10.1109/icra.2015.7139363

Cited by 275 publications

(193 citation statements)

References 10 publications

Supporting

Mentioning

181

Contrasting

Unclassified

Order By: Relevance

“…Asif et al [1] report improved recognition performance using a cascade of Random Forest classifiers that are fused in a hierarchical manner. Finally, in recent independent work Schwarz et al [20] proposed to use features extracted from CNNs pre-trained on ImageNet for RGB-D object recognition. While they also make use of a two-stream network they do not fine-tune the CNN for RGB-D recognition, but rather just use the pre-trained network as is.…”

Section: Related Workmentioning

confidence: 99%

“…[14] 77.7 ± 1.9 78.8 ± 2.7 86.2 ± 2.1 CKM Desc. [3] N/A N/A 86.4 ± 2.3 CNN-RNN [22] 80.8 ± 4.2 78.9 ± 3.8 86.8 ± 3.3 Upgraded HMP [5] 82.4 ± 3.1 81.2 ± 2.3 87.5 ± 2.9 CaRFs [1] N/A N/A 88.1 ± 2.4 CNN Features [20] 83.1 ± 2.0 N/A 89.4 ± 1.3 Ours, Fus-CNN (HHA) 84.1 ± 2.7 83.0 ± 2.7 91.0 ± 1.9 Ours, Fus-CNN (jet) 84.1 ± 2.7 83.8 ± 2.7 91.3 ± 1.4 a preliminary experiment. A fixed momentum value of 0.9 and a mini-batch size of 128 was used for all experiments if not stated otherwise.…”

Section: A Experimental Setupmentioning

confidence: 99%

See 1 more Smart Citation

Multimodal deep learning for robust RGB-D object recognition

Eitel

Springenberg

Spinello

et al. 2015

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

519

401

View full text Add to dashboard Cite

Abstract-Robust object recognition is a crucial ingredient of many, if not all, real-world robotics applications. This paper leverages recent progress on Convolutional Neural Networks (CNNs) and proposes a novel RGB-D architecture for object recognition. Our architecture is composed of two separate CNN processing streams -one for each modality -which are consecutively combined with a late fusion network. We focus on learning with imperfect sensor data, a typical problem in real-world robotics tasks. For accurate learning, we introduce a multi-stage training methodology and two crucial ingredients for handling depth data with CNNs. The first, an effective encoding of depth information for CNNs that enables learning without the need for large depth datasets. The second, a data augmentation scheme for robust learning with depth images by corrupting them with realistic noise patterns. We present stateof-the-art results on the RGB-D object dataset [15] and show recognition in challenging RGB-D real-world noisy settings.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: A Experimental Setupmentioning

confidence: 99%

Multimodal deep learning for robust RGB-D object recognition

Eitel

Springenberg

Spinello

et al. 2015

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

519

401

View full text Add to dashboard Cite

show abstract

“…Features from the colour and depth channels were learned separately and then concatenated for use in the final softmax classifier. Schwarz et al 25 proposed using two pretrained CNNs to extract features from colour and depth images individually. Then, Eitel et al 12 proposed a similar structure to that in the method of Schwarz et al 25 The difference was that, in the latter, the fusion CNNs were trained end-to-end using the RGB-D data, which gives a higher accuracy.…”

Section: Related Workmentioning

confidence: 99%

“…Schwarz et al 25 proposed using two pretrained CNNs to extract features from colour and depth images individually. Then, Eitel et al 12 proposed a similar structure to that in the method of Schwarz et al 25 The difference was that, in the latter, the fusion CNNs were trained end-to-end using the RGB-D data, which gives a higher accuracy. Bai et al 26 proposed dividing the input images into several subsets according to their shapes and colours.…”

Section: Related Workmentioning

confidence: 99%

A PCA–CCA network for RGB-D object recognition

Sun

Zhao

et al. 2018

International Journal of Advanced Robotic Systems

View full text Add to dashboard Cite

Object recognition is one of the essential issues in computer vision and robotics. Recently, deep learning methods have achieved excellent performance in red-green-blue (RGB) object recognition. However, the introduction of depth information presents a new challenge: How can we exploit this RGB-D data to characterize an object more adequately? In this article, we propose a principal component analysis-canonical correlation analysis network for RGB-D object recognition. In this new method, two stages of cascaded filter layers are constructed and followed by binary hashing and block histograms. In the first layer, the network separately learns principal component analysis filters for RGB and depth. Then, in the second layer, canonical correlation analysis filters are learned jointly using the two modalities. In this way, the different characteristics of the RGB and depth modalities are considered by our network as well as the characteristics of the correlation between the two modalities. Experimental results on the most widely used RGB-D object data set show that the proposed method achieves an accuracy which is comparable to state-of-the-art methods. Moreover, our method has a simpler structure and is efficient even without graphics processing unit acceleration.

show abstract

“…While most methods use 2D visual information only [2], there are numerous 3D shape based recognition techniques [3,4], as well as methods that use both visual and shape information [5,6]. Object detection methods are essential for scene understanding [7], which has a number of applications in different fields, such as robotics [1] or augmented reality [8].…”

Section: Introductionmentioning

confidence: 99%

3D Object Detection and Scene Optimization for Tangible Augmented Reality

Szemenyei

Vajda

2018

Period. Polytech. Elec. Eng. Comp. Sci.

View full text Add to dashboard Cite

Object recognition in 3D scenes is one of the fundamental tasks in computer vision. It is used frequently in robotics or augmented reality applications [1]. In our work we intend to apply 3D shape recognition to create a Tangible Augmented Reality system that is able to pair virtual and real objects in natural indoors scenes. In this paper we present a method for arranging virtual objects in a real-world scene based on primitive shape graphs. For our scheme, we propose a graph node embedding algorithm for graphs with vectorial nodes and edges, and genetic operators designed to improve the quality of the global setup of virtual objects. We show that our methods improve the quality of the arrangement significantly.

show abstract

RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features

Cited by 275 publications

References 10 publications

Multimodal deep learning for robust RGB-D object recognition

Multimodal deep learning for robust RGB-D object recognition

A PCA–CCA network for RGB-D object recognition

3D Object Detection and Scene Optimization for Tangible Augmented Reality

Contact Info

Product

Resources

About