Wider or Deeper: Revisiting the ResNet Model for Visual Recognition

Wu, Zifeng; Shen, Chunhua; Hengel, Anton van den

doi:10.1016/j.patcog.2019.01.006

Cited by 1,106 publications

(553 citation statements)

References 43 publications

Supporting

Mentioning

545

Contrasting

Unclassified

Order By: Relevance

“…In this experiment, following previous works [31,36,34] without COCO pre-training, we train our model on SBD [10] and then fine-tune it on official trainval set. We use the same training protocol as described in the main paper.…”

Section: Pascal Voc Without Coco Pre-trainingmentioning

confidence: 99%

See 1 more Smart Citation

Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation

Tian

Shen

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Self Cite

222

134

View full text Add to dashboard Cite

Recent semantic segmentation methods exploit encoderdecoder architectures to produce the desired pixel-wise segmentation prediction. The last layer of the decoders is typically a bilinear upsampling procedure to recover the final pixel-wise prediction. We empirically show that this oversimple and data-independent bilinear upsampling may lead to sub-optimal results.In this work, we propose a data-dependent upsampling (DUpsampling) to replace bilinear, which takes advantages of the redundancy in the label space of semantic segmentation and is able to recover the pixel-wise prediction from low-resolution outputs of CNNs. The main advantage of the new upsampling layer lies in that with a relatively lowerresolution feature map such as 1 16 or 1 32 of the input size, we can achieve even better segmentation accuracy, significantly reducing computation complexity. This is made possible by 1) the new upsampling layer's much improved reconstruction capability; and more importantly 2) the DUpsampling based decoder's flexibility in leveraging almost arbitrary combinations of the CNN encoders' features. Experiments demonstrate that our proposed decoder outperforms the state-of-the-art decoder, with only ∼20% of computation. Finally, without any post-processing, the framework equipped with our proposed decoder achieves new state-of-the-art performance on two datasets: 88.1% mIOU on PASCAL VOC with 30% computation of the previously best model; and 52.5% mIOU on PASCAL Context.

show abstract

Section: Pascal Voc Without Coco Pre-trainingmentioning

confidence: 99%

“…Method mIOU (%) DPN [20] 74.1 Piecewise [17] 75.3 ResNet-38 [31] 82.5 PSPNet [36] 82.6 DFN [32] 82.7 EncNet [34] 82.9 Our proposed (Xception-65) 85.3 Table 7: State-of-the-art methods on PASCAL VOC test set without COCO pre-training.…”

Section: Pascal Voc Without Coco Pre-trainingmentioning

confidence: 99%

Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation

Tian

Shen

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Self Cite

222

134

View full text Add to dashboard Cite

show abstract

“…Gensim package (Version 3.4) [40] was used in training the Word2Vec model. [41] & Inception [42]. The tested CNN architectures use 1D convolution instead of 2D convolution.…”

Section: E Word2vec Model Trainingmentioning

confidence: 99%

SpeciesMLP: Sequence based Multi-layer Perceptron for Amplicon Read Classification Using Real-time Data Augmentation

Kishk¹,

El-Hadidi²

2018

Preprint

View full text Add to dashboard Cite

Taxonomic assignment is the core of targeted metagenomics approaches that aims to assign sequencing reads to their corresponding taxonomy. Sequence similarity searching and machine learning (ML) are two commonly used approaches for taxonomic assignment based on the 16S rRNA. Similarity based approaches require high computation resources, while ML approaches don't need these resources in prediction. The majority of these ML approaches depend on k-mer frequency rather than direct sequence, which leads to low accuracy on short reads as k-mer frequency doesn't consider k-mer position. Moreover training ML taxonomic classifiers depend on a specific read length which may reduce the prediction performance by decreasing read length. In this study, we built a neural network classifier for 16S rRNA reads based on SILVA database (version 132). Modeling was performed on direct sequences using Convolutional neural network (CNN) and other neural network architectures such as Multi-layer Perceptron and Recurrent Neural Network. In order to reduce modeling time of the direct sequences, In-silico PCR was applied on SILVA database. Total number of 14 subset databases were generated by universal primers for each single or paired high variable region (HVR). Moreover, in this study, we illustrate the results for the V2 database model on 8443 classes on the species level and 1552 on the genus level. In order to simulate sequencing fragmentation, we trained variable length subsequences from 50 bases till the full length of the HVR that are randomly changing in each training iteration. Simple MLP model with global max pooling gives 0.71 & 0.93 test accuracy for the species and genus levels respectively (for reads of 100 base subsequences) and 0.75 & 0.96 accuracy for the species and genus levels respectively (on the full length V2 HVR). In this study, we present a novel method (SpeciesMLP https://github.com/ali-kishk/SpeciesMLP) to model the direct amplicon sequence using MLP over a sequence of k-mers faster 20 times than CNN in training and 10 times in prediction.

show abstract

“…To illustrate this dilemma, Fig. 1 gives the accuracy (mIoU) and inference speed (frames per second (fps)) obtained by several state-of-the-art methods, including FCN-8s [9], CRF-RNN [17], DeepLab [10], DeepLabv2 [12], DeepLabv3+ [13], ResNet-38 [18], PSPNet [11], DUC [19], RefineNet [20], LRR [21], DPN [22], FRRN [23], TwoColumn [24], SegNet [25], SQNet [26], ENet [27], arXiv:2003.08736v2 [cs.CV] 3 Apr 2020 ERFNet [28], ICNet [29], SwiftNetRN [30], LEDNet [31], BiSeNet1 [32], BiSeNet2 [32], DFANet [33] and our proposed method, on the Cityscapes test dataset. Clearly, how to achieve a good tradeoff between accuracy and speed is still a challenging problem.…”

Section: Introductionmentioning

confidence: 99%

Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes

Dong

Yan

Shen

et al. 2021

IEEE Trans. Intell. Transport. Syst.

Self Cite

View full text Add to dashboard Cite

Deep Convolutional Neural Networks (DCNNs) have recently shown outstanding performance in semantic image segmentation. However, state-of-the-art DCNN-based semantic segmentation methods usually suffer from high computational complexity due to the use of complex network architectures. This greatly limits their applications in the real-world scenarios that require real-time processing. In this paper, we propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes, which achieves a good trade-off between accuracy and speed. Specifically, a Lightweight Baseline Network with Atrous convolution and Attention (LBN-AA) is firstly used as our baseline network to efficiently obtain dense feature maps. Then, the Distinctive Atrous Spatial Pyramid Pooling (DASPP), which exploits the different sizes of pooling operations to encode the rich and distinctive semantic information, is developed to detect objects at multiple scales. Meanwhile, a Spatial detail-Preserving Network (SPN) with shallow convolutional layers is designed to generate highresolution feature maps preserving the detailed spatial information. Finally, a simple but practical Feature Fusion Network (FFN) is used to effectively combine both shallow and deep features from the semantic branch (DASPP) and the spatial branch (SPN), respectively. Extensive experimental results show that the proposed method respectively achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps on the challenging Cityscapes and CamVid test datasets (by only using a single NVIDIA TITAN X card). This demonstrates that the proposed method offers excellent performance at the real-time speed for semantic segmentation of urban street scenes.

show abstract

Wider or Deeper: Revisiting the ResNet Model for Visual Recognition

Cited by 1,106 publications

References 43 publications

Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation

Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation

SpeciesMLP: Sequence based Multi-layer Perceptron for Amplicon Read Classification Using Real-time Data Augmentation

Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes

Contact Info

Product

Resources

About