ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

Paszke, Adam; Chaurasia, Abhishek; Sangpil, Kim,; Culurciello, Eugenio

doi:10.48550/arxiv.1606.02147

Cited by 467 publications

(771 citation statements)

References 30 publications

Supporting

Mentioning

688

Contrasting

Unclassified

Order By: Relevance

“…For the details of our bottleneck, see Section 3.2. Inspired by ENet [37], standard SE-ResNeXt blocks and SE-ResNeXt blocks with dilated convolution are connected in series to form our backbone. See Table 1 for the detailed architecture of the network.…”

Section: Se-resnext Blockmentioning

confidence: 99%

Deep Leaning-Based Ultra-Fast Stair Detection

Wang¹,

Pei²,

Qiu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Staircases are some of the most common building structures in urban environments. Stair detection is an important task for various applications, including the environmental perception of exoskeleton robots, humanoid robots, and rescue robots and the navigation of visually impaired people. Most existing stair detection algorithms have difficulty dealing with the diversity of stair structure materials, extreme light and serious occlusion. Inspired by human perception, we propose an end-to-end method based on deep learning. Specifically, we treat the process of stair line detection as a multitask involving coarsegrained semantic segmentation and object detection. The input images are divided into cells, and a simple neural network is used to judge whether each cell contains stair lines. For cells containing stair lines, the locations of the stair lines relative to each cell are regressed. Extensive experiments on our dataset show that our method can achieve high performance in terms of both speed and accuracy. A lightweight version can even achieve 300+ frames per second with the same resolution. Our code and dataset will be soon available at GitHub.

show abstract

Section: Se-resnext Blockmentioning

confidence: 99%

Deep Leaning-Based Ultra-Fast Stair Detection

Wang¹,

Pei²,

Qiu³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Image semantic segmentation is an active research field that has seen significant progress since the pioneering work applying fully convolutional networks for the task [6]. Subsequent methods have focused on high quality [7] [8] [9] [10] [11] [12] and/or efficient [13] [14] [15] [16] design choices. More recently, the design of models for video semantic segmentation has received increasing attention.…”

Section: A Image and Video Semantic Segmentationmentioning

confidence: 99%

Temporally Constrained Neural Networks (TCNN): A framework for semi-supervised video semantic segmentation

Alapatt¹,

Mascagni²,

Vardazaryan³

et al. 2021

Preprint

View full text Add to dashboard Cite

A major obstacle to building models for effective semantic segmentation, and particularly video semantic segmentation, is a lack of large and well annotated datasets. This bottleneck is particularly prohibitive in highly specialized and regulated fields such as medicine and surgery, where video semantic segmentation could have important applications but data and expert annotations are scarce. In these settings, temporal clues and anatomical constraints could be leveraged during training to improve performance. Here, we present Temporally Constrained Neural Networks (TCNN), a semi-supervised framework used for video semantic segmentation of surgical videos. In this work, we show that autoencoder networks can be used to efficiently provide both spatial and temporal supervisory signals to train deep learning models. We test our method on a newly introduced video dataset of laparoscopic cholecystectomy procedures, Endoscapes, and an adaptation of a public dataset of cataract surgeries, CaDIS. We demonstrate that lower-dimensional representations of predicted masks can be leveraged to provide a consistent improvement on both sparsely labeled datasets with no additional computational cost at inference time. Further, the TCNN framework is model-agnostic and can be used in conjunction with other model design choices with minimal additional complexity.

show abstract

“…The last layer of the decoder is the softmax layer, which is used to classify pixels. The decoder of RailNet has trained to output binary segmentation maps, indicating which pixels belong to a rail line or not [23].…”

Section: Decodermentioning

confidence: 99%

“…As mentioned in the previous section, the RailNet outputs a set of pixels for the rail lines. It is not ideal to fit polynomials by these pixels in the original image space, so people have to resort to higher-order polynomials to deal with curved rail lines [23]. A generally accepted solution to this problem is to project the image into a "bird's eye" representation, where the rail lines are parallel to each other, so curved rail lines can be fitted with second to third-order polynomials.…”

Section: The Rail Line Fitting Algorithm Based On Sliding Window Dete...mentioning

confidence: 99%

Accurate and Lightweight RailNet for Real-Time Rail Line Detection

et al. 2021

View full text Add to dashboard Cite

Railway transportation has always occupied an important position in daily life and social progress. In recent years, computer vision has made promising breakthroughs in intelligent transportation, providing new ideas for detecting rail lines. Yet the majority of rail line detection algorithms use traditional image processing to extract features, and their detection accuracy and instantaneity remain to be improved. This paper goes beyond the aforementioned limitations and proposes a rail line detection algorithm based on deep learning. First, an accurate and lightweight RailNet is designed, which takes full advantage of the powerful advanced semantic information extraction capabilities of deep convolutional neural networks to obtain high-level features of rail lines. The Segmentation Soul (SS) module is creatively added to the RailNet structure, which improves segmentation performance without any additional inference time. The Depth Wise Convolution (DWconv) is introduced in the RailNet to reduce the number of network parameters and eventually ensure real-time detection. Afterward, according to the binary segmentation maps of RailNet output, we propose the rail line fitting algorithm based on sliding window detection and apply the inverse perspective transformation. Thus the polynomial functions and curvature of the rail lines are calculated, and rail lines are identified in the original images. Furthermore, we collect a real-world rail lines dataset, named RAWRail. The proposed algorithm has been fully validated on the RAWRail dataset, running at 74 FPS, and the accuracy reaches 98.6%, which is superior to the current rail line detection algorithms and shows powerful potential in real applications.

show abstract

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

Cited by 467 publications

References 30 publications

Deep Leaning-Based Ultra-Fast Stair Detection

Deep Leaning-Based Ultra-Fast Stair Detection

Temporally Constrained Neural Networks (TCNN): A framework for semi-supervised video semantic segmentation

Accurate and Lightweight RailNet for Real-Time Rail Line Detection

Contact Info

Product

Resources

About