Discriminative Segment Annotation in Weakly Labeled Video

Tang, Kevin; Sukthankar, Rahul; Yagnik, Jay; Li, Feifei

doi:10.1109/cvpr.2013.321

Cited by 120 publications

(164 citation statements)

References 29 publications

Supporting

Mentioning

164

Contrasting

Order By: Relevance

“…The co-localization problem is similar to co-segmentation [21-24, 38, 39, 50] and weakly supervised localization (WSL) [13,17,31,32,[43][44][45][46]49]. In contrast to co-segmentation, we seek to localize objects with bounding boxes rather than segmentations, which allows us to greatly decrease the number of variables in our problem.…”

Section: Related Workmentioning

confidence: 99%

Efficient Image and Video Co-localization with Frank-Wolfe Algorithm

Joulin

Tang

2014

Computer Vision – ECCV 2014

Self Cite

153

205

View full text Add to dashboard Cite

Abstract. In this paper, we tackle the problem of performing efficient co-localization in images and videos. Co-localization is the problem of simultaneously localizing (with bounding boxes) objects of the same class across a set of distinct images or videos. Building upon recent stateof-the-art methods, we show how we are able to naturally incorporate temporal terms and constraints for video co-localization into a quadratic programming framework. Furthermore, by leveraging the Frank-Wolfe algorithm (or conditional gradient), we show how our optimization formulations for both images and videos can be reduced to solving a succession of simple integer programs, leading to increased efficiency in both memory and speed. To validate our method, we present experimental results on the PASCAL VOC 2007 dataset for images and the YouTube-Objects dataset for videos, as well as a joint combination of the two.

show abstract

Section: Related Workmentioning

confidence: 99%

Efficient Image and Video Co-localization with Frank-Wolfe Algorithm

Joulin

Tang

2014

Computer Vision – ECCV 2014

Self Cite

153

205

View full text Add to dashboard Cite

show abstract

“…The dataset is built using YouTube-Objects dataset [17] which consists of videos collected for 10 different object classes. We use this dataset because all the frames of a video have object of interest segmented [10]. Therefore, these videos can be used as ground-truth for evaluation.…”

Section: Dataset and Setupmentioning

confidence: 99%

“…We use the subset of the dataset described in Tang et al [10]. The dataset is built using YouTube-Objects dataset [17] which consists of videos collected for 10 different object classes.…”

Section: Dataset and Setupmentioning

confidence: 99%

See 1 more Smart Citation

Efficient Object Localization and Segmentation in Weakly Labeled Videos

Rochan

Yang

2014

Advances in Visual Computing

View full text Add to dashboard Cite

Abstract. In this paper, we tackle the problem of efficiently segmenting objects in weakly labeled videos. Internet videos (e.g., YouTube) are often associated with a semantic tag describing the main object within the video. However, this tag does not provide any spatial or temporal information about the object within the video. So these videos are weakly labeled. We propose a novel and efficient approach to localize the object of interest within the video and perform pixel-level segmentation. Given a video with an object tag, our proposed method automatically localizes the object and segments it from the background in each frame of the video. Our method combines object appearance modeling and temporal consistency among frames in a principled framework. Our method does not require user inputs or object detectors, so it can be potentially applied to videos of any object categories. We evaluate our method on a dataset consisting of more than 100 video shots of 10 different object categories. Our experimental results show that our method outperforms other baseline approaches.

show abstract

“…Tang et al [38] proposed a method to automatically annotate discriminative objects in weakly labeled videos. Jain et al [39] represent discriminative video objects at the patch level.…”

Section: Related Workmentioning

confidence: 99%

Detecting and Removing Visual Distractors for Video Aesthetic Enhancement

Zhang

et al. 2018

IEEE Trans. Multimedia

View full text Add to dashboard Cite

Abstract-Personal videos often contain visual distractors, which are objects that are accidentally captured that can distract viewers from focusing on the main subjects. We propose a method to automatically detect and localize these distractors through learning from a manually labeled dataset. To achieve spatially and temporally coherent detection, we propose extracting features at the Temporal-Superpixel (TSP) level using a traditional SVM-based learning framework. We also experiment with end-to-end learning using Convolutional Neural Networks (CNNs), which achieves slightly higher performance than other methods. The classification result is further refined in a post-processing step based on graph-cut optimization. Experimental results show that our method achieves an accuracy of 81% and a recall of 86%. We demonstrate several ways of removing the detected distractors to improve the video quality, including video hole filling; video frame replacement; and camera path re-planning. The user study results show that our method can significantly improve the aesthetic quality of videos.

show abstract

Discriminative Segment Annotation in Weakly Labeled Video

Cited by 120 publications

References 29 publications

Efficient Image and Video Co-localization with Frank-Wolfe Algorithm

Efficient Image and Video Co-localization with Frank-Wolfe Algorithm

Efficient Object Localization and Segmentation in Weakly Labeled Videos

Detecting and Removing Visual Distractors for Video Aesthetic Enhancement

Contact Info

Product

Resources

About