ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

Mehta, Sachin; Rastegari, Mohammad; Caspi, Anat; Shapiro, Linda G.; Hajishirzi, Hannaneh

doi:10.1007/978-3-030-01249-6_34

Cited by 703 publications

(494 citation statements)

References 59 publications

Supporting

Mentioning

462

Contrasting

Unclassified

Order By: Relevance

“…In Table V, the mIOU of the main categories of Cityscapes test set are listed and one can easily observe that the most common categories in the dataset have the highest mIOU score. The results of LiteSeg are displayed in Figure 3 for qualitative analysis against ESPNet [13] and ERFNet [12].…”

Section: E Cityscapes Benchmark Resultsmentioning

confidence: 99%

“…Both the inference time, which reflects the realtime performance, and number of parameters, which reflects These results clearly show the ability of LiteSeg to generate different lightweight models to manipulate the accuracy and computational efficiency by using different backbone network. For example, using 640 × 360 input resolution, LiteSeg with MobileNetV2 [23] as a backbone network achieved a speed of 161 FPS which exceeds the speed of ESPNet [13] by 17 FPS on the same machine, while providing an improved accuracy by 7.51%.…”

Section: Computational Performance Evaluationmentioning

confidence: 99%

“…Many approaches have been proposed to deal with this problem, e.g., ERFNet [12] employed a residual connection and depthwise separable convolution to increase receptive field to achieve high accuracy with a reasonable performance. Alternatively, ESPNet [13] proposed an efficient module called efficient spatial pyramid (ESP), which uses point wise convolution and spatial pyramid of dilated convolution. ESPnet along with Enet provide a lightweight architectures but with a degradation in accuracy.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

LiteSeg: A Novel Lightweight ConvNet for Semantic Segmentation

Emara

Munim

Abbas

2019

2019 Digital Image Computing: Techniques and Applications (DICTA)

View full text Add to dashboard Cite

Semantic image segmentation plays a pivotal role in many vision applications including autonomous driving and medical image analysis. Most of the former approaches move towards enhancing the performance in terms of accuracy with a little awareness of computational efficiency. In this paper, we introduce LiteSeg, a lightweight architecture for semantic image segmentation. In this work, we explore a new deeper version of Atrous Spatial Pyramid Pooling module (ASPP) and apply short and long residual connections, and depthwise separable convolution, resulting in a faster and efficient model. LiteSeg architecture is introduced and tested with multiple backbone networks as Darknet19, MobileNet, and ShuffleNet to provide multiple trade-offs between accuracy and computational cost. The proposed model LiteSeg, with MobileNetV2 as a backbone network, achieves an accuracy of 67.81% mean intersection over union at 161 frames per second with 640 × 360 resolution on the Cityscapes dataset.Index Terms-semantic image segmentation, atrous spatial pyramid pooling, encoder decoder, and depthwise separable convolution.

show abstract

Section: E Cityscapes Benchmark Resultsmentioning

confidence: 99%

Section: Computational Performance Evaluationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

LiteSeg: A Novel Lightweight ConvNet for Semantic Segmentation

Emara

Munim

Abbas

2019

2019 Digital Image Computing: Techniques and Applications (DICTA)

View full text Add to dashboard Cite

show abstract

“…Due to the efficiency of ENet, it can be used for the tasks requiring low latency operations. Efficient Spatial Pyramid Network (ESPNet) [50] and Efficient Residual Factorized Network (ERFNet) [28] are another two efficient real-time semantic segmentation methods, which are faster and more accurate than ENet using the similar number of parameters. In particular, ESPNet makes use of the Efficient Spatial Pyramid module (ESP), which follows the convolution factorization principle that decomposes a standard convolution into a pointwise convolution and a spatial pyramid of atrous convolutions.…”

Section: B Real-time Semantic Segmentation Methodsmentioning

confidence: 99%

“…Therefore, the segmentation accuracy can be greatly improved without increasing much computational burden. On the other hand, the gridding issue caused by the atrous convolution operations [19], [50] can be alleviated to some extent (see Fig. 7 for an illustration).…”

Section: B Lightweight Baseline Network With Atrous Convolution Andmentioning

confidence: 99%

Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes

Dong

Yan

Shen

et al. 2021

IEEE Trans. Intell. Transport. Syst.

View full text Add to dashboard Cite

Deep Convolutional Neural Networks (DCNNs) have recently shown outstanding performance in semantic image segmentation. However, state-of-the-art DCNN-based semantic segmentation methods usually suffer from high computational complexity due to the use of complex network architectures. This greatly limits their applications in the real-world scenarios that require real-time processing. In this paper, we propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes, which achieves a good trade-off between accuracy and speed. Specifically, a Lightweight Baseline Network with Atrous convolution and Attention (LBN-AA) is firstly used as our baseline network to efficiently obtain dense feature maps. Then, the Distinctive Atrous Spatial Pyramid Pooling (DASPP), which exploits the different sizes of pooling operations to encode the rich and distinctive semantic information, is developed to detect objects at multiple scales. Meanwhile, a Spatial detail-Preserving Network (SPN) with shallow convolutional layers is designed to generate highresolution feature maps preserving the detailed spatial information. Finally, a simple but practical Feature Fusion Network (FFN) is used to effectively combine both shallow and deep features from the semantic branch (DASPP) and the spatial branch (SPN), respectively. Extensive experimental results show that the proposed method respectively achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps on the challenging Cityscapes and CamVid test datasets (by only using a single NVIDIA TITAN X card). This demonstrates that the proposed method offers excellent performance at the real-time speed for semantic segmentation of urban street scenes.

show abstract

Deriving and interpreting robust features for survival prediction of brain tumor patients

Rajput,

Kapdi,

Raval

et al. 2024

Int J Imaging Syst Tech

View full text Add to dashboard Cite

Accurate prediction of survival days (SD) is vital for planning treatments in glioma patients, as type‐IV tumors typically have a poor prognosis and meager survival rates. SD prediction is challenging and heavily dependent on the extracted feature sets. Additionally, comprehending the behavior of complex machine learning models is a vital yet challenging aspect, particularly to integrate such models into the medical domain responsibly. Therefore, this study develops a robust feature set and an ensemble‐based regressor model to predict patients' SD accurately. We aim to understand how these features behave and contribute to predicting SD. To accomplish this, we employed post‐hoc interpretable techniques, precisely Shapley Additive exPlanations (SHAP), Partial Dependence Plots (PDP), and Accumulated Local Effects (ALE) plots. Furthermore, we introduced an investigation to establish a direct connection between radiomic features and their biological significance to enhance the interpretability of radiomic features. The best SD scores on the BraTS2020 training set are 0.504 for accuracy, 59927.38 mean squared error (MSE), 20101.86 median squared error (medianSE), and 0.725 Spearman ranking coefficient (SRC). The validation set's accuracy is 0.586, MSE is 76529.43, medianSE is 41402.78, and SRC is 0.52. The proposed predictor model exhibited superior performance compared with leading contemporary approaches across multiple performance metrics.

show abstract

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

Cited by 703 publications

References 59 publications

LiteSeg: A Novel Lightweight ConvNet for Semantic Segmentation

LiteSeg: A Novel Lightweight ConvNet for Semantic Segmentation

Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes

Deriving and interpreting robust features for survival prediction of brain tumor patients

Contact Info

Product

Resources

About