2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00125
|View full text |Cite
|
Sign up to set email alerts
|

MoNet: Deep Motion Exploitation for Video Object Segmentation

Abstract: In this paper, we propose a novel MoNet model to deeply exploit motion cues for boosting video object segmentation performance from two aspects, i.e., frame representation learning and segmentation refinement. Concretely, MoNet exploits computed motion cue (i.e., optical flow) to reinforce the representation of the target frame by aligning and integrating representations from its neighbors. The new representation provides valuable temporal contexts for segmentation and improves robustness to various common con… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
61
0
1

Year Published

2019
2019
2020
2020

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 130 publications
(62 citation statements)
references
References 33 publications
0
61
0
1
Order By: Relevance
“…Location-sensitive embeddings used to refine an initial foreground prediction are explored in LSE [9]. MoNet [38] exploits optical flow motion cues by feature alignment and a distance transform layer. Using reinforcement learning to estimate a region of interest to be segmented is explored by Han et al [13].…”
Section: Related Workmentioning
confidence: 99%
“…Location-sensitive embeddings used to refine an initial foreground prediction are explored in LSE [9]. MoNet [38] exploits optical flow motion cues by feature alignment and a distance transform layer. Using reinforcement learning to estimate a region of interest to be segmented is explored by Han et al [13].…”
Section: Related Workmentioning
confidence: 99%
“…For SVOS methods, the target object(s) is provided in the first frame and tracked automatically [60,8,5,68,2,69,64,71] or interactively by users [1] in the subsequent frames. Numerous algorithms were proposed based on graphical models [54], object proposals [46], supertrajectories [61], etc.…”
Section: Video Object Segmentationmentioning
confidence: 99%
“…Wang et al [33] proposed a global Gaussian distribution embedding network (G 2 DeNet), where one multivariate Gaussian, identified as a symmetric positive definite matrix of covariance matrix and mean vector [20], is plugged at network end. MoNet [38] proposed a sub-matrix square-root layer, making G 2 DeNet to have compact representation. In [3], the first-order information are combined with the second-order one which achieves consistent improvements over the standard bilinear networks on texture recognition.…”
Section: Related Workmentioning
confidence: 99%