2018
DOI: 10.1007/978-3-030-01249-6_34
|View full text |Cite
|
Sign up to set email alerts
|

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

Abstract: We introduce a fast and efficient convolutional neural network, ES-PNet, for semantic segmentation of high resolution images under resource constraints. ESPNet is based on a new convolutional module, efficient spatial pyramid (ESP), which is efficient in terms of computation, memory, and power. ES-PNet is 22 times faster (on a standard GPU) and 180 times smaller than the state-of-the-art semantic segmentation network PSPNet [1], while its categorywise accuracy is only 8% less. We evaluated ESPNet on a variety … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
462
0
2

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 703 publications
(494 citation statements)
references
References 59 publications
1
462
0
2
Order By: Relevance
“…In Table V, the mIOU of the main categories of Cityscapes test set are listed and one can easily observe that the most common categories in the dataset have the highest mIOU score. The results of LiteSeg are displayed in Figure 3 for qualitative analysis against ESPNet [13] and ERFNet [12].…”
Section: E Cityscapes Benchmark Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…In Table V, the mIOU of the main categories of Cityscapes test set are listed and one can easily observe that the most common categories in the dataset have the highest mIOU score. The results of LiteSeg are displayed in Figure 3 for qualitative analysis against ESPNet [13] and ERFNet [12].…”
Section: E Cityscapes Benchmark Resultsmentioning
confidence: 99%
“…Both the inference time, which reflects the realtime performance, and number of parameters, which reflects These results clearly show the ability of LiteSeg to generate different lightweight models to manipulate the accuracy and computational efficiency by using different backbone network. For example, using 640 × 360 input resolution, LiteSeg with MobileNetV2 [23] as a backbone network achieved a speed of 161 FPS which exceeds the speed of ESPNet [13] by 17 FPS on the same machine, while providing an improved accuracy by 7.51%.…”
Section: Computational Performance Evaluationmentioning
confidence: 99%
See 1 more Smart Citation
“…Due to the efficiency of ENet, it can be used for the tasks requiring low latency operations. Efficient Spatial Pyramid Network (ESPNet) [50] and Efficient Residual Factorized Network (ERFNet) [28] are another two efficient real-time semantic segmentation methods, which are faster and more accurate than ENet using the similar number of parameters. In particular, ESPNet makes use of the Efficient Spatial Pyramid module (ESP), which follows the convolution factorization principle that decomposes a standard convolution into a pointwise convolution and a spatial pyramid of atrous convolutions.…”
Section: B Real-time Semantic Segmentation Methodsmentioning
confidence: 99%
“…Therefore, the segmentation accuracy can be greatly improved without increasing much computational burden. On the other hand, the gridding issue caused by the atrous convolution operations [19], [50] can be alleviated to some extent (see Fig. 7 for an illustration).…”
Section: B Lightweight Baseline Network With Atrous Convolution Andmentioning
confidence: 99%