Unsupervised Temporal Consistency Metric for Video Segmentation in Highly-Automated Driving

Varghese, Serin; Bayzidi, Yasin; Bär, Andreas; Kapoor, Nikhil; Lahiri, Sounak; Schneider, Ján; Schmidt, Nico M.; Schlicht, Peter; Hüger, Fabian; Fingscheidt, Tim

doi:10.1109/cvprw50498.2020.00176

Cited by 30 publications

(15 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Among online-capable metrics for the overall performance prediction, some only focus at malfunction detection and correction (again involving an ensemble of DNNs) [17], [64], or exploit temporal inconsistency between consecutive predictions [15], which has to be defined in a highly task-specific way. The closest prior work to ours is presumably from Löhdefink et al [18], who propose to train an autoencoder to reconstruct an image on the same data a semantic segmentation DNN is trained on, showing a correlation between both task's metrics.…”

Section: Performance Prediction Of Neural Networkmentioning

confidence: 99%

“…one typically assumes that an offline-measured performance of a DNN is also valid in inference, this is actually not true due to the mentioned environment changes. Meanwhile, less-frequently proposed online-capable algorithms are either task-specific [15], rely on ensembles of DNNs [16], [17], or only show the correlation of a proposed metric to the absolute performance metric without further outlining an online-capable predictive scheme [15], [18]. Naively using the confidence scores of the network itself [19] is not recommended as DNNs often assign a probability of close to one to a single class [20], and even more important, the uncertainty of measurements does not bear predictive power to estimate the absolute DNN performance.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Online Performance Prediction of Perception DNNs by Multi-Task Learning With Depth Estimation

Klingner

Fingscheidt

2021

IEEE Trans. Intell. Transport. Syst.

Self Cite

View full text Add to dashboard Cite

Online performance prediction (or: observation) of deep neural networks (DNNs) in highly automated driving presents an unsolved task until now, as most DNNs are evaluated offline requiring datasets with ground truth labels. In practice, however, DNN performance depends on the used camera type, lighting and weather conditions, and on various other kinds of domain shift. Also, the input to DNN-based perception systems can be perturbed by adversarial attacks requiring means to detect these input perturbations. In this work we propose a method to mitigate these problems by a multi-task learning approach with monocular depth estimation as a secondary task, which enables us to predict the DNN's performance for various other (primary) tasks by evaluating only the depth estimation task with a physical depth measurement provided, e.g., by a LiDAR sensor. We show the effectiveness of our method for the primary task of semantic segmentation using various training datasets, test datasets, model architectures, and input perturbations. Our method provides an effective way to predict (observe) the performance of DNNs for semantic segmentation even on a single-image basis and is transferable to other primary DNN-based perception tasks in a straightforward manner.

show abstract

Section: Performance Prediction Of Neural Networkmentioning

confidence: 99%

mentioning

confidence: 99%

Online Performance Prediction of Perception DNNs by Multi-Task Learning With Depth Estimation

Klingner

Fingscheidt

2021

IEEE Trans. Intell. Transport. Syst.

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, misalignment of key frames with nearby frames might harm accuracy relative to the original image segmentation models. Another use of optical flow is in [26], where the authors introduce a flow-based consistency measure to evaluate, rather than directly improve, the quality of video semantic segmentation.…”

Section: Related Workmentioning

confidence: 99%

Temporally stable video segmentation without video annotations

Azulay¹,

Halperin²,

Vantzos³

et al. 2021

Preprint

View full text Add to dashboard Cite

Temporally consistent dense video annotations are scarce and hard to collect. In contrast, image segmentation datasets (and pre-trained models) are ubiquitous, and easier to label for any novel task. In this paper, we introduce a method to adapt still image segmentation models to video in an unsupervised manner, by using an optical flow-based consistency measure. To ensure that the inferred segmented videos appear more stable in practice, we verify that the consistency measure is well correlated with human judgement via a user study. Training a new multi-input multioutput decoder using this measure as a loss, together with a technique for refining current image segmentation datasets and a temporal weighted-guided filter, we observe stability improvements in the generated segmented videos with minimal loss of accuracy.

show abstract

“…This, however, does not factor in motion. As such, researchers incorporate motion estimation (e.g., optical flow) when measuring temporal consistency [16,24,19,33]. However, estimating accurate flow on real-world data can be very challenging, and in many cases, more error-prone and time-consuming than the segmentation task itself.…”

Section: Related Workmentioning

confidence: 99%

“…This approach, however, does not factor in the object movements and changing occlusions. Most of the recent works [16,24,33] utilize motion-based pixel correspondence between two consecutive frames (i.e., optical flow [13]), to measure temporal consistency. More specifically, given two consecutive video frames, the segmentation of one frame is warped to the other based on the estimated flow, and the warped and actual segmentation maps are then compared to measure the segmentation consistency between these two frames.…”

Section: Introductionmentioning

confidence: 99%