MobileNetV2: Inverted Residuals and Linear Bottlenecks

Sandler, Mark; Howard, Andrew; Zhu, Menglong; Zhmoginov, Andrey; Chen, Liang-Chieh

doi:10.1109/cvpr.2018.00474

Cited by 17,164 publications

(9,859 citation statements)

References 23 publications

Supporting

Mentioning

8,315

Contrasting

Unclassified

Order By: Relevance

“…Following the pioneer work (Sandler et al, 2018;Howard et al, 2019), we design a Gather-and-Expansion Layer, as discussed in Section 4.2 and illustrated in Figure 5. The main improvements consist of two-fold: (i) we adopt one 3 × 3 convolution as the Gather Layer instead of one point-wise convolution in the inverted bottleneck of Mo-bileNetV2 (Sandler et al, 2018); (ii) when stride = 2, we employs two 3 × 3 depth-wise convolution to substitute a 5 × 5 depth-wise convolution. Table 4b shows the improvement of our block design.…”

Section: Ablative Evaluation On Cityscapesmentioning

confidence: 99%

See 1 more Smart Citation

BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation

Wang

Peng³

et al. 2018

Lecture Notes in Computer Science

1,875

1,263

View full text Add to dashboard Cite

The low-level details and high-level semantics are both essential to the semantic segmentation task. However, to speed up the model inference, current approaches almost always sacrifice the low-level details, which leads to a considerable accuracy decrease. We propose to treat these spatial details and categorical semantics separately to achieve high accuracy and high efficiency for real-time semantic segmentation. To this end, we propose an efficient and effective architecture with a good trade-off between speed and accuracy, termed Bilateral Segmentation Network (BiSeNet V2). This architecture involves: (i) a Detail Branch, with wide channels and shallow layers to capture low-level details and generate high-resolution feature representation; (ii) a Semantic Branch, with narrow channels and deep layers to obtain high-level semantic context. The Semantic Branch is lightweight due to reducing the channel capacity and a fast-downsampling strategy. Furthermore, we design a Guided Aggregation Layer to enhance mutual connections and fuse both types of feature representation. Besides, a booster training strategy is designed to improve the segmentation performance without any extra inference cost. Extensive quantitative and qualitative evaluations demonstrate that the pro-posed architecture performs favourably against a few state-of-the-art real-time semantic segmentation approaches. Specifically, for a 2,048×1,024 input, we achieve 72.6% Mean IoU on the Cityscapes test set with a speed of 156 FPS on one NVIDIA GeForce GTX 1080 Ti card, which is significantly faster than existing methods, yet we achieve better segmentation accuracy. Code and trained models will be made publicly available.

show abstract

Section: Ablative Evaluation On Cityscapesmentioning

confidence: 99%

“…It has an advantage in memory access cost (Sandler et al, 2018;Howard et al, 2019). The expansion ratio of can control the output dimension of this layer.…”

Section: Ablative Evaluation On Cityscapesmentioning

confidence: 99%

BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation

Wang

Peng³

et al. 2018

Lecture Notes in Computer Science

1,875

1,263

View full text Add to dashboard Cite

show abstract

“…They visualize the feature maps extracted by different filters and view each filter as a visual unit focusing on different visual components.of the ResNet-50 [28], and meanwhile save more than 75% of parameters and 50% computational time. In the literature, approaches for compressing the deep networks can be classified into five categories: parameter pruning [26,29,30,31], parameter quantizing [32,33,34,35,36,37,38,39,40,41], low-rank parameter factorization [42,43,44,45,46], transferred/compact convolutional filters [47,48,49,50], and knowledge distillation [51,52,53,54,55,56]. The parameter pruning and quantizing mainly focus on eliminating the redundancy in the model parameters respectively by removing the redundant/uncritical ones or compressing the parameter space (e.g.…”

mentioning

confidence: 99%

Binary neural networks: A survey

Qin

Gong

Liu

et al. 2020

Pattern Recognition

395

171

View full text Add to dashboard Cite

The binary neural network, largely saving the storage and computation, serves as a promising technique for deploying deep models on resource-limited devices.However, the binarization inevitably causes severe information loss, and even worse, its discontinuity brings difficulty to the optimization of the deep network.To address these issues, a variety of algorithms have been proposed, and achieved satisfying progress in recent years. In this paper, we present a comprehensive survey of these algorithms, mainly categorized into the native solutions directly conducting binarization, and the optimized ones using techniques like minimizing the quantization error, improving the network loss function, and reducing the gradient error. We also investigate other practical aspects of binary neural networks such as the hardware-friendly design and the training tricks. Then, we give the evaluation and discussions on different tasks, including image classification, object detection and semantic segmentation. Finally, the challenges that may be faced in future research are prospected.the heavy computation and storage still inevitably limit the applications of the deep CNNs in practice. Besides, due to the huge model parameter space, the prediction of the neural networks is usually viewed as a black-box, which brings great challenges to the interpretability of CNNs. Some works like [21,22,23] empirically explore the function of each layer in the network. They visualize the feature maps extracted by different filters and view each filter as a visual unit focusing on different visual components.of the ResNet-50 [28], and meanwhile save more than 75% of parameters and 50% computational time. In the literature, approaches for compressing the deep networks can be classified into five categories: parameter pruning [26,29,30,31], parameter quantizing [32,33,34,35,36,37,38,39,40,41], low-rank parameter factorization [42,43,44,45,46], transferred/compact convolutional filters [47,48,49,50], and knowledge distillation [51,52,53,54,55,56]. The parameter pruning and quantizing mainly focus on eliminating the redundancy in the model parameters respectively by removing the redundant/uncritical ones or compressing the parameter space (e.g. , from the floating-point weights to the integer ones). Low-rank factorization applies the matrix/tensor decomposition techniques to estimate the informative parameters using the proxy ones of small size. The compact convolutional filter based approaches rely on the carefullydesigned structural convolutional filters to reduce the storage and computation complexity. The knowledge distillation methods try to distill a more compact model to reproduce the output of a larger network.Among the existing network compression techniques, quantization based one serves as a promising and fast solution that yields highly compact models compared to their floating-point counterparts, by representing the network weights with very low precision. Along this direction, the most extreme quantization is binarization, the interest...

show abstract

“…All existing results show performance with a VGG-16 based model. We train a MobileNet based model which has been shown to achieve similar performance to VGG-16 (71.8% vs 71.5% Top-1 accuracy on ImageNet) while requiring fewer computational resources [25,51]. Our fully-supervised implementation pretrained on ImageNet achieves 69.6% mIOU on Pascal VOC 2012 [17]; in comparison, the reference DeepLab-VGG16 model achieves 68.7% mIOU [12] and the re-implementation in [36] Performance Comparison.…”

Section: Weakly Supervised Segmentation Comparisonmentioning

confidence: 99%

Block Annotation: Better Image Annotation With Sub-Image Decomposition

Lin

Upchurch

Bala

2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

Image datasets with high-quality pixel-level annotations are valuable for semantic segmentation: labelling every pixel in an image ensures that rare classes and small objects are annotated. However, full-image annotations are expensive, with experts spending up to 90 minutes per image. We propose block sub-image annotation as a replacement for full-image annotation. Despite the attention cost of frequent task switching, we find that block annotations can be crowdsourced at higher quality compared to full-image annotation with equal monetary cost using existing annotation tools developed for full-image annotation. Surprisingly, we find that 50% pixels annotated with blocks allows semantic segmentation to achieve equivalent performance to 100% pixels annotated. Furthermore, as little as 12% of pixels annotated allows performance as high as 98% of the performance with dense annotation. In weakly-supervised settings, block annotation outperforms existing methods by 3-4% (absolute) given equivalent annotation time. To recover the necessary global structure for applications such as characterizing spatial context and affordance relationships, we propose an effective method to inpaint block-annotated images with high-quality labels without additional human effort. As such, fewer annotations can also be used for these applications compared to full-image annotation.

show abstract

MobileNetV2: Inverted Residuals and Linear Bottlenecks

Cited by 17,164 publications

References 23 publications

BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation

BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation

Binary neural networks: A survey

Block Annotation: Better Image Annotation With Sub-Image Decomposition

Contact Info

Product

Resources

About