Procedings of the British Machine Vision Conference 2013 2013
DOI: 10.5244/c.27.53
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Object Discovery and Segmentation in Videos

Abstract: Unsupervised object discovery is the task of finding recurring objects over an unsorted set of images without any human supervision, which becomes more and more important as the amount of visual data grows exponentially. Existing approaches typically build on still images and rely on different prior knowledge to yield accurate results. In contrast, we propose a novel video-based approach, allowing also for exploiting motion information, which is a strong and physically valid indicator for foreground objects, t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
7
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 12 publications
(7 citation statements)
references
References 35 publications
0
7
0
Order By: Relevance
“…At one end of the spectrum, fully supervised methods require careful annotation of ob-125 ject locations in the form of bounding boxes [16,34,32], segmentations [37] or even object part locations [33,38], which is costly and can frequently introduce inconsistency and ambiguity. On the other hand, unsupervised learning methods that do not require any supervision aim at 130 finding similar objects in a set of unlabelled images [7,39] or videos [40]. They are, however, often limited to frequently occuring and visually consistent objects and are easily susceptible to background clutter.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…At one end of the spectrum, fully supervised methods require careful annotation of ob-125 ject locations in the form of bounding boxes [16,34,32], segmentations [37] or even object part locations [33,38], which is costly and can frequently introduce inconsistency and ambiguity. On the other hand, unsupervised learning methods that do not require any supervision aim at 130 finding similar objects in a set of unlabelled images [7,39] or videos [40]. They are, however, often limited to frequently occuring and visually consistent objects and are easily susceptible to background clutter.…”
Section: Related Workmentioning
confidence: 99%
“…On the other hand, labelled videos involving human activity, like pouring milk or eating cereal are abundantly 40 available. Such data, however, violates the principle assumption since the prevelant themes of the video are now human body parts and background clutter instead of objects of interest; thus resulting in the failure of contemporary methods as demonstrated in our experiments.…”
Section: Introductionmentioning
confidence: 99%
“…Therefore, there has been a surge in exploring in unsupervised and weakly-supervised approaches for object detection. However, fully unsupervised approaches [30,17] without any annotations currently give considerably inferior performance on similar tasks, while conventional weaklysupervised methods [2,16,42] use static images to learn the detectors. These object detectors, however, fail to generalize to videos due to shift in domain.…”
Section: Introductionmentioning
confidence: 99%
“…Like Gall (2014, 2017), we codetect small and medium sized objects, but do so without a depth map or heavy dependence on human pose data. Like Schulter et al (2013), we codetect both moving and stationary objects, but do so with a larger set of object classes and a larger video corpus. Also, like Ramanathan et al (2014), we use sentences, but do so for a vocabulary that goes beyond pronouns, nominals, and names that are used to codetect only human face tracks.…”
Section: Related Workmentioning
confidence: 99%
“…Schulter et al (2013) construct a Conditional Random Field (CRF) in each input video frame with segmented superpixels as vertices. They use both motion and appearance information as unary potentials, and put binary edges between both spatially and temporally neighboring superpixels.…”
Section: Related Workmentioning
confidence: 99%