2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
DOI: 10.1109/cvpr.2017.114
|View full text |Cite
|
Sign up to set email alerts
|

Surveillance Video Parsing with Single Frame Supervision

Abstract: Surveillance video parsing, which segments the video frames into several labels, e.g., face, pants, left-leg, has wide applications [38,9]. However, pixel-wisely annotating all frames is tedious and inefficient. In this paper, we develop a Single frame Video Parsing (SVP) method which requires only one labeled frame per video in training stage. To parse one particular frame, the video segment preceding the frame is jointly considered. SVP (i) roughly parses the frames within the video segment, (ii) estimates t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
32
0
1

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 58 publications
(36 citation statements)
references
References 53 publications
0
32
0
1
Order By: Relevance
“…Different from fc layers where the input and output are fixed size, FCN can output dense predictions from arbitrarysized inputs. Therefore, FCN is widely used in segmenta-tion [28,27], image restoration [9], and dense object detection windows [34]. In particular, our PPR-FCN is inspired by another benefit of FCN utilized in R-FCN [23]: per-RoI computation can be shared by convolutions.…”
Section: Related Workmentioning
confidence: 99%
“…Different from fc layers where the input and output are fixed size, FCN can output dense predictions from arbitrarysized inputs. Therefore, FCN is widely used in segmenta-tion [28,27], image restoration [9], and dense object detection windows [34]. In particular, our PPR-FCN is inspired by another benefit of FCN utilized in R-FCN [23]: per-RoI computation can be shared by convolutions.…”
Section: Related Workmentioning
confidence: 99%
“…Besides, given a video as the parsing target, the adjacent frames could also provide significant clues to assist the parsing mission. For example, Liu et al [36] developed a Single frame Video Parsing method, which could fuse the parsing results of the adjacent frames with the assist of the optical flow maps, and parse the whole video with only one labelled frame per video in training stage.…”
Section: Related Workmentioning
confidence: 99%
“…The estimated poses provide the shape prior, which is necessary for segmentation. Similarly, the authors of Reference [35] proposed to integrate parsing with optical flow estimation. The authors of Reference [36] incorporated a self-supervised joint loss to ensure the consistency between parsing and pose.…”
Section: Related Workmentioning
confidence: 99%