Surveillance Video Parsing with Single Frame Supervision

Liu, Si; Wang, Changhu; Qian, Ruihe; Han, Yonghao; Bao, Renda; Sun, Yeneng

doi:10.1109/cvpr.2017.114

Cited by 58 publications

(36 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Different from fc layers where the input and output are fixed size, FCN can output dense predictions from arbitrarysized inputs. Therefore, FCN is widely used in segmenta-tion [28,27], image restoration [9], and dense object detection windows [34]. In particular, our PPR-FCN is inspired by another benefit of FCN utilized in R-FCN [23]: per-RoI computation can be shared by convolutions.…”

Section: Related Workmentioning

confidence: 99%

PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN

Zhang

Kyaw

et al. 2017

2017 IEEE International Conference on Computer Vision (ICCV)

131

108

View full text Add to dashboard Cite

We aim to tackle a novel vision task called Weakly Supervised Visual Relation Detection (WSVRD) to detect "subject-predicate-object" relations in an image with object relation groundtruths available only at the image level. This is motivated by the fact that it is extremely expensive to label the combinatorial relations between objects at the instance level. Compared to the extensively studied problem, Weakly Supervised Object Detection (WSOD), WSVRD is more challenging as it needs to examine a large set of regions pairs, which is computationally prohibitive and more likely stuck in a local optimal solution such as those involving wrong spatial context. To this end, we present a Parallel, Pairwise Region-based, Fully Convolutional Network (PPR-FCN) for WSVRD. It uses a parallel FCN architecture that simultaneously performs pair selection and classification of single regions and region pairs for object and relation detection, while sharing almost all computation shared over the entire image. In particular, we propose a novel position-role-sensitive score map with pairwise RoI pooling to efficiently capture the crucial context associated with a pair of objects. We demonstrate the superiority of PPR-FCN over all baselines in solving the WSVRD challenge by using results of extensive experiments over two visual relation benchmarks. arXiv:1708.01956v1 [cs.CV] 7 Aug 2017 Evaluations of Phrase & Relation DetectionComparing Methods. We evaluate the overall performance of PPR-FCN for WSVRD. We compared the following methods: 1) GroundR [35], a weakly supervised vi-

show abstract

Section: Related Workmentioning

confidence: 99%

PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN

Zhang

Kyaw

et al. 2017

2017 IEEE International Conference on Computer Vision (ICCV)

131

108

View full text Add to dashboard Cite

show abstract

“…Besides, given a video as the parsing target, the adjacent frames could also provide significant clues to assist the parsing mission. For example, Liu et al [36] developed a Single frame Video Parsing method, which could fuse the parsing results of the adjacent frames with the assist of the optical flow maps, and parse the whole video with only one labelled frame per video in training stage.…”

Section: Related Workmentioning

confidence: 99%

Conditional progressive network for clothing parsing

Zhang

Luo

et al. 2019

IET Image Processing

View full text Add to dashboard Cite

“…The estimated poses provide the shape prior, which is necessary for segmentation. Similarly, the authors of Reference [35] proposed to integrate parsing with optical flow estimation. The authors of Reference [36] incorporated a self-supervised joint loss to ensure the consistency between parsing and pose.…”

Section: Related Workmentioning

confidence: 99%

A CNN Model for Human Parsing Based on Capacity Optimization

Jiang

Chi

2019

Applied Sciences

View full text Add to dashboard Cite

Although a state-of-the-art performance has been achieved in pixel-specific tasks, such as saliency prediction and depth estimation, convolutional neural networks (CNNs) still perform unsatisfactorily in human parsing where semantic information of detailed regions needs to be perceived under the influences of variations in viewpoints, poses, and occlusions. In this paper, we propose to improve the robustness of human parsing modules by introducing a depth-estimation module. A novel scheme is proposed for the integration of a depth-estimation module and a human-parsing module. The robustness of the overall model is improved with the automatically obtained depth labels. As another major concern, the computational efficiency is also discussed. Our proposed human parsing module with 24 layers can achieve a similar performance as the baseline CNN model with over 100 layers. The number of parameters in the overall model is less than that in the baseline model. Furthermore, we propose to reduce the computational burden by replacing a conventional CNN layer with a stack of simplified sub-layers to further reduce the overall number of trainable parameters. Experimental results show that the integration of two modules contributes to the improvement of human parsing without additional human labeling. The proposed model outperforms the benchmark solutions and the capacity of our model is better matched to the complexity of the task.

show abstract

Surveillance Video Parsing with Single Frame Supervision

Cited by 58 publications

References 53 publications

PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN

PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN

Conditional progressive network for clothing parsing

A CNN Model for Human Parsing Based on Capacity Optimization

Contact Info

Product

Resources

About