2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016
DOI: 10.1109/cvpr.2016.394
|View full text |Cite
|
Sign up to set email alerts
|

DAG-Recurrent Neural Networks for Scene Labeling

Abstract: Figure 1: With the local representations extracted from Convolutional Neural Networks (CNNs), the 'sand' pixels (in the first image) are likely to be misclassified as 'road', and the 'building' pixels (in the second image) are easy to get confused with 'streetlight'. Our DAG-RNN is able to significantly boost the discriminative power of local representations by modeling their contextual dependencies. As a result, it can produce smoother and more semantically meaningful labeling map. The figure is best viewed i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
139
0
1

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
3
1

Relationship

3
6

Authors

Journals

citations
Cited by 122 publications
(140 citation statements)
references
References 34 publications
0
139
0
1
Order By: Relevance
“…23. http://www.semantic3d.net/ [43] GoogLeNet(FCN) Patchwise CNN, Standalone CRF CRFasRNN [70] FCN-8s CRF reformulated as RNN Dilation [71] VGG-16 Dilated convolutions ENet [72] ENet bottleneck Bottleneck module for efficiency Multi-scale-CNN-Raj [73] VGG-16(FCN) Multi-scale architecture Multi-scale-CNN-Eigen [74] Custom Multi-scale sequential refinement Multi-scale-CNN-Roy [75] Multi-scale-CNN-Eigen Multi-scale coarse-to-fine refinement Multi-scale-CNN-Bian [76] FCN Independently trained multi-scale FCNs ParseNet [77] VGG-16 Global context feature fusion ReSeg [78] VGG-16 + ReNet Extension of ReNet to semantic segmentation LSTM-CF [79] Fast R-CNN + DeepMask Fusion of contextual information from multiple sources 2D-LSTM [80] MDRNN Image context modelling rCNN [81] MDRNN Different input sizes, image context DAG-RNN [82] Elman network Graph image structure for context modelling SDS [10] R-CNN + Box CNN Simultaneous detection and segmentation DeepMask [83] VGG-A Proposals generation for segmentation SharpMask [84] DeepMask Top-down refinement module MultiPathNet [85] Fast R-CNN + DeepMask Multi path information flow through network Huang-3DCNN [86] Own 3DCNN 3DCNN for voxelized point clouds PointNet [87] Own MLP-based Segmentation of unordered point sets Clockwork Convnet [88] FCN Clockwork scheduling for sequences 3DCNN-Zhang…”
Section: Methodsmentioning
confidence: 99%
“…23. http://www.semantic3d.net/ [43] GoogLeNet(FCN) Patchwise CNN, Standalone CRF CRFasRNN [70] FCN-8s CRF reformulated as RNN Dilation [71] VGG-16 Dilated convolutions ENet [72] ENet bottleneck Bottleneck module for efficiency Multi-scale-CNN-Raj [73] VGG-16(FCN) Multi-scale architecture Multi-scale-CNN-Eigen [74] Custom Multi-scale sequential refinement Multi-scale-CNN-Roy [75] Multi-scale-CNN-Eigen Multi-scale coarse-to-fine refinement Multi-scale-CNN-Bian [76] FCN Independently trained multi-scale FCNs ParseNet [77] VGG-16 Global context feature fusion ReSeg [78] VGG-16 + ReNet Extension of ReNet to semantic segmentation LSTM-CF [79] Fast R-CNN + DeepMask Fusion of contextual information from multiple sources 2D-LSTM [80] MDRNN Image context modelling rCNN [81] MDRNN Different input sizes, image context DAG-RNN [82] Elman network Graph image structure for context modelling SDS [10] R-CNN + Box CNN Simultaneous detection and segmentation DeepMask [83] VGG-A Proposals generation for segmentation SharpMask [84] DeepMask Top-down refinement module MultiPathNet [85] Fast R-CNN + DeepMask Multi path information flow through network Huang-3DCNN [86] Own 3DCNN 3DCNN for voxelized point clouds PointNet [87] Own MLP-based Segmentation of unordered point sets Clockwork Convnet [88] FCN Clockwork scheduling for sequences 3DCNN-Zhang…”
Section: Methodsmentioning
confidence: 99%
“…For example, the existing RNN models mainly focus on either sequence-structured inputs, such as Long Short-Term Memory (LSTM) [17] and GRU, or tree-structured inputs, such as Tree-LSTM [18]. There are a handful of RNN models that try to model static DAGs designed for different application domains; e.g., DAG-RNN [19], [20] models each 2D image as a DAG for scene labeling, while RNN-LE [21] models each contact map over a protein's amino acids as a DAG for protein structure prediction. However, both DAG-RNN and RNN-LE are based on the plain RNN architecture, and are unable to capture the peculiarities of a diffusion process.…”
Section: Introductionmentioning
confidence: 99%
“…h(t) is the hidden state of the t th subsequence and its information will be transformed to the future subsequences through matrix U hh . Here we do not employ an additional softmax layer to normalize the output o into a probability vector as done in other RNN architectures [9], [45]. This is because the softmax layer is unsuitable for our soft regression model as the sum of the elements in the output of softmax operator is 1, while the sum of the elements in α(t)y i is α(t), whose value is within [0, 1].…”
Section: A Soft Rnn (Srnn) Regression Based Early Action Predictionmentioning
confidence: 99%
“…Recurrent Neural Networks (RNN) have been widely used to address the sequential prediction problems in literatures, such as speech recognition [9], human action/activity recognition [34], [41], scene labeling [39], [45], image caption [19], and object segmentation [31]. RNN and its variants LSTM [34], GRNN [4], etc.…”
Section: Related Workmentioning
confidence: 99%