Layered Optical Flow Estimation Using a Deep Neural Network with a Soft Mask

Zhang, Xi; Ma, Di; Ouyang, Xu; Jiang, Shanshan; Gan, Lin; Agam, Gady

doi:10.24963/ijcai.2018/163

Cited by 5 publications

(9 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the future, our approach that proposed in this study will be developed with better optical flow estimation methods such as Deep Neural Network method [29], promising better computational time. Our method also needs to be implemented with another video dataset.…”

Section: Discussionmentioning

confidence: 99%

Movement Direction Estimation on Video using Optical Flow Analysis on Multiple Frames

Solichin¹,

Harjoko²,

Putra³

2018

ijacsa

View full text Add to dashboard Cite

This study proposed a model for determining the movement direction of the object based on the optical flow features. To increase the speed of computational time, optical flow features derived into a Histograms of Oriented Optical Flow (HOOF). We extracted them locally on the grid with a certain size. Moreover, to determine the movement direction we also analyzed multiple frames at once. Based on the experiment results, showing that the value of accuracy, precision, and recall of the movement detection is good, amounting to 93% for accuracy, 73.07% for precision and 84.25% for recall. Furthermore, the results of testing using the best parameter shows the value of accuracy of 98.1%, 35.6% precision, 41.2% recall, and direction detection error rate (DDER) 25,28%. The results of this study are expected to provide benefits in video analysis studies such as the riots detection and abnormal movement in public places.

show abstract

Section: Discussionmentioning

confidence: 99%

Movement Direction Estimation on Video using Optical Flow Analysis on Multiple Frames

Solichin¹,

Harjoko²,

Putra³

2018

ijacsa

View full text Add to dashboard Cite

show abstract

“…In contrast, our method uses a fully end-toend CNN and only unlabelled RGB image sequences for training and inference. In [14], Zhang et al propose a CNN based layered optical flow estimation that relies on their "soft-mask" module to separate of flow into disjoint classes but they do not synthesise images using layers. Our LDIS pipeline remains faithful to the traditional layered approach and provides explicit constraints during grouping of pixels using affine motion models allowing us to confidently identify motion homogeneous regions.…”

Section: A Video Object Segmentationmentioning

confidence: 99%

“…We explicitly ensure that layer membership is disjoint using a modified maxout operation inspired by [14]. For each pixel, the maxout operation retains the maximal value of the two alpha maps, the non-maximal value is set to 0.…”

Section: A Layered Differentiable Image Synthesismentioning

confidence: 99%

“…Sevilla-Lara et al in [13] combine a layered model and a neural network but their method is a hybrid approach where a pretrained network is combined with a variational expectation maximisation algrorithm for inference. In [14], Zhang et al use a maxout operation to perform disjoint separation of flow. However, they do not apply any explicit constraints during flow separation and it is unclear whether flow is reliably grouped based on motion homogeneity.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Learning To Segment Dominant Object Motion From Watching Videos

Sahir¹,

Armin²,

Li³

et al. 2021

Preprint

View full text Add to dashboard Cite

Existing deep learning based unsupervised video object segmentation methods still rely on ground-truth segmentation masks to train. Unsupervised in this context only means that no annotated frames are used during inference. As obtaining ground-truth segmentation masks for real image scenes is a laborious task, we envision a simple framework for dominant moving object segmentation that neither requires annotated data to train nor relies on saliency priors or pre-trained optical flow maps. Inspired by a layered image representation [1], we introduce a technique to group pixel regions according to their affine parametric motion. This enables our network to learn segmentation of the dominant foreground object using only RGB image pairs as input for both training and inference. We establish a baseline for this novel task using a new MovingCars dataset and show competitive performance against recent methods that require annotated masks to train. 1

show abstract

“…Based on the convolutional neural networks pre-trained for image classification, DCNN can learn information of salient objects at any position of the input image. In [25], a soft-mask module is added to an optical flow estimation network, which aims to mask out parts with consistency motions. The mask filters are trained by fixing the pre-trained weights.…”

Section: Introductionmentioning

confidence: 99%

Mask Sparse Representation Based on Semantic Features for Thermal Infrared Target Tracking

Peng

Chen

et al. 2019

Remote Sensing

View full text Add to dashboard Cite

Thermal infrared (TIR) target tracking is a challenging task as it entails learning an effective model to identify the target in the situation of poor target visibility and clutter background. The sparse representation, as a typical appearance modeling approach, has been successfully exploited in the TIR target tracking. However, the discriminative information of the target and its surrounding background is usually neglected in the sparse coding process. To address this issue, we propose a mask sparse representation (MaskSR) model, which combines sparse coding together with high-level semantic features for TIR target tracking. We first obtain the pixel-wise labeling results of the target and its surrounding background in the last frame, and then use such results to train target-specific deep networks using a supervised manner. According to the output features of the deep networks, the high-level pixel-wise discriminative map of the target area is obtained. We introduce the binarized discriminative map as a mask template to the sparse representation and develop a novel algorithm to collaboratively represent the reliable target part and unreliable target part partitioned with the mask template, which explicitly indicates different discriminant capabilities by label 1 and 0. The proposed MaskSR model controls the superiority of the reliable target part in the reconstruction process via a weighted scheme. We solve this multi-parameter constrained problem by a customized alternating direction method of multipliers (ADMM) method. This model is applied to achieve TIR target tracking in the particle filter framework. To improve the sampling effectiveness and decrease the computation cost at the same time, a discriminative particle selection strategy based on kernelized correlation filter is proposed to replace the previous random sampling for searching useful candidates. Our proposed tracking method was tested on the VOT-TIR2016 benchmark. The experiment results show that the proposed method has a significant superiority compared with various state-of-the-art methods in TIR target tracking.

show abstract

Layered Optical Flow Estimation Using a Deep Neural Network with a Soft Mask

Cited by 5 publications

References 11 publications

Movement Direction Estimation on Video using Optical Flow Analysis on Multiple Frames

Movement Direction Estimation on Video using Optical Flow Analysis on Multiple Frames

Learning To Segment Dominant Object Motion From Watching Videos

Mask Sparse Representation Based on Semantic Features for Thermal Infrared Target Tracking

Contact Info

Product

Resources

About