Improving Semantic Segmentation via Video Propagation and Label Relaxation

Zhu, Yi; Sapra, Karan; Reda, Fitsum A.; Shih, Kevin J.; Newsam, Shawn; Tao, Andrew; Catanzaro, Bryan

doi:10.1109/cvpr.2019.00906

Cited by 362 publications

(300 citation statements)

References 42 publications

(62 reference statements)

Supporting

Mentioning

296

Contrasting

Order By: Relevance

“…Xu et al [16] applied different segmentation strategies to various regions of the input image, which exploited optical flow to preserve the semantics in static regions. Zhu et al [17] investigated the generation of future semantic segmentation labels from current manual labels by video prediction based on motion vector estimation.…”

Section: B Semantics Sharingmentioning

confidence: 99%

Boosting Real-Time Driving Scene Parsing With Shared Semantics

Xiang

Bao

et al. 2020

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

Real-time scene parsing is a fundamental feature for autonomous driving vehicles with multiple cameras. In this letter we demonstrate that sharing semantics between cameras with different perspectives and overlapped views can boost the parsing performance when compared with traditional methods, which individually process the frames from each camera. Our framework is based on a deep neural network for semantic segmentation but with two kinds of additional modules for sharing and fusing semantics. On the one hand, a semantics sharing module is designed to establish the pixel-wise mapping between the input images. Features as well as semantics are shared by the map to reduce duplicated workload which leads to more efficient computation. On the other hand, feature fusion modules are designed to combine different modal of semantic features, which leverage the information from both inputs for better accuracy. To evaluate the effectiveness of the proposed framework, we have applied our network to a dual-camera vision system for driving scene parsing. Experimental results show that our network outperforms the baseline method on the parsing accuracy with comparable computations.

show abstract

Section: B Semantics Sharingmentioning

confidence: 99%

Boosting Real-Time Driving Scene Parsing With Shared Semantics

Xiang

Bao

et al. 2020

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

show abstract

“…A second approach to handle the problem of implausible prediction outputs that lack realism is to reduce the complexity of the problem. Many authors, for example, used data with lower-dimensional image content, such as label images, instead of natural image scenes [12,15,[22][23][24][25]. Others split the problem into two problems, motion and content prediction, and learn separate representations for the static and dynamic components.…”

Section: Related Workmentioning

confidence: 99%

The Importance of Loss Functions for Increasing the Generalization Abilities of a Deep Learning-Based Next Frame Prediction Model for Traffic Scenes

Aigner

Körner

2020

MAKE

View full text Add to dashboard Cite

This paper analyzes in detail how different loss functions influence the generalization abilities of a deep learning-based next frame prediction model for traffic scenes. Our prediction model is a convolutional long-short term memory (ConvLSTM) network that generates the pixel values of the next frame after having observed the raw pixel values of a sequence of four past frames. We trained the model with 21 combinations of seven loss terms using the Cityscapes Sequences dataset and an identical hyper-parameter setting. The loss terms range from pixel-error based terms to adversarial terms. To assess the generalization abilities of the resulting models, we generated predictions up to 20 time-steps into the future for four datasets of increasing visual distance to the training dataset-KITTI Tracking, BDD100K, UA-DETRAC, and KIT AIS Vehicles. All predicted frames were evaluated quantitatively with both traditional pixel-based evaluation metrics, that is, mean squared error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM), and recent, more advanced, feature-based evaluation metrics, that is, Fréchet inception distance (FID), and learned perceptual image patch similarity (LPIPS). The results show that solely by choosing a different combination of losses, we can boost the prediction performance on new datasets by up to 55%, and by up to 50% for long-term predictions.

show abstract

“…With an evaluation time of one second, this algorithm is too slow for robotic applications or automated driving. (Zhu et al, 2019) show a video-based approach to further improve the segmentation process by propagating labels between two frames jointly. In contrast to that, (Chen et al, 2018) show a network capable of close to real-time semantic segmentation.…”

Section: Related Workmentioning

confidence: 99%

Concept on Landmark Detection in Road Scene Images Taken From a Top-View Camera System

Albrecht

Kraus²,

Stilla

2020

Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci.

View full text Add to dashboard Cite

Abstract. In this paper, we demonstrate the inclusion of a top-view camera system mounted on a city bus in an existing sensor setup. A novel sensor setup with five down-facing cameras is mounted on the roof of a MAN Lion’s City 12 city bus to extract landmarks in road scene images. Its positioning is validated by an exemplary detection of lane markings. The concept for further landmark detection with the help of the presented camera system is explained in this paper and sensor data fusion methods are proposed. Based on our previous findings (Albrecht et al., 2019), strengths of the novel sensor system are introduced to improve the current environment perception system. For now, only a qualitative observation of the capability to detect lane markings and other landmarks can be presented. Future work will use the current findings for landmark detection for a vehicle self-localization system.

show abstract

Improving Semantic Segmentation via Video Propagation and Label Relaxation

Cited by 362 publications

References 42 publications

Boosting Real-Time Driving Scene Parsing With Shared Semantics

Boosting Real-Time Driving Scene Parsing With Shared Semantics

The Importance of Loss Functions for Increasing the Generalization Abilities of a Deep Learning-Based Next Frame Prediction Model for Traffic Scenes

Concept on Landmark Detection in Road Scene Images Taken From a Top-View Camera System

Contact Info

Product

Resources

About