2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00474
|View full text |Cite
|
Sign up to set email alerts
|

MobileNetV2: Inverted Residuals and Linear Bottlenecks

Abstract: In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3.is based on an inverted residu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

28
8,315
8
63

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 17,164 publications
(9,859 citation statements)
references
References 23 publications
28
8,315
8
63
Order By: Relevance
“…Following the pioneer work (Sandler et al, 2018;Howard et al, 2019), we design a Gather-and-Expansion Layer, as discussed in Section 4.2 and illustrated in Figure 5. The main improvements consist of two-fold: (i) we adopt one 3 × 3 convolution as the Gather Layer instead of one point-wise convolution in the inverted bottleneck of Mo-bileNetV2 (Sandler et al, 2018); (ii) when stride = 2, we employs two 3 × 3 depth-wise convolution to substitute a 5 × 5 depth-wise convolution. Table 4b shows the improvement of our block design.…”
Section: Ablative Evaluation On Cityscapesmentioning
confidence: 99%
See 1 more Smart Citation
“…Following the pioneer work (Sandler et al, 2018;Howard et al, 2019), we design a Gather-and-Expansion Layer, as discussed in Section 4.2 and illustrated in Figure 5. The main improvements consist of two-fold: (i) we adopt one 3 × 3 convolution as the Gather Layer instead of one point-wise convolution in the inverted bottleneck of Mo-bileNetV2 (Sandler et al, 2018); (ii) when stride = 2, we employs two 3 × 3 depth-wise convolution to substitute a 5 × 5 depth-wise convolution. Table 4b shows the improvement of our block design.…”
Section: Ablative Evaluation On Cityscapesmentioning
confidence: 99%
“…It has an advantage in memory access cost (Sandler et al, 2018;Howard et al, 2019). The expansion ratio of can control the output dimension of this layer.…”
Section: Ablative Evaluation On Cityscapesmentioning
confidence: 99%
“…They visualize the feature maps extracted by different filters and view each filter as a visual unit focusing on different visual components.of the ResNet-50 [28], and meanwhile save more than 75% of parameters and 50% computational time. In the literature, approaches for compressing the deep networks can be classified into five categories: parameter pruning [26,29,30,31], parameter quantizing [32,33,34,35,36,37,38,39,40,41], low-rank parameter factorization [42,43,44,45,46], transferred/compact convolutional filters [47,48,49,50], and knowledge distillation [51,52,53,54,55,56]. The parameter pruning and quantizing mainly focus on eliminating the redundancy in the model parameters respectively by removing the redundant/uncritical ones or compressing the parameter space (e.g.…”
mentioning
confidence: 99%
“…All existing results show performance with a VGG-16 based model. We train a MobileNet based model which has been shown to achieve similar performance to VGG-16 (71.8% vs 71.5% Top-1 accuracy on ImageNet) while requiring fewer computational resources [25,51]. Our fully-supervised implementation pretrained on ImageNet achieves 69.6% mIOU on Pascal VOC 2012 [17]; in comparison, the reference DeepLab-VGG16 model achieves 68.7% mIOU [12] and the re-implementation in [36] Performance Comparison.…”
Section: Weakly Supervised Segmentation Comparisonmentioning
confidence: 99%