Proceedings of the 2nd ACM International Workshop on Multimedia Analysis for Ecological Data 2013
DOI: 10.1145/2509896.2509905
|View full text |Cite
|
Sign up to set email alerts
|

Cross-modal alignment for wildlife recognition

Abstract: We propose an unsupervised framework for recognizing animals in videos using subtitles. In this framework, the alignment between animals and their names is performed using an Expectation Maximization algorithm which is adapted to two very different circumstances-1) when the bounding boxes are available and 2) when the frame as a whole is used instead of bounding boxes. With the goal of maximizing precision, recall and F-measure, the experiments compare a multitude of natural language processing approaches and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2013
2013
2017
2017

Publication Types

Select...
2
2

Relationship

3
1

Authors

Journals

citations
Cited by 4 publications
(17 citation statements)
references
References 14 publications
(10 reference statements)
0
17
0
Order By: Relevance
“…However, solutions proposed in the literature use a visual demarcator such as a bounding box obtained from a face detector. Moving on to the problem of recognizing animals in wildlife documentaries [4], with the current state-of-the-art, it is not feasible to train a sufficiently accurate animal detector, since the variety within the bounding boxes is too large. Acquiring these bounding boxes by hand is tedious.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, solutions proposed in the literature use a visual demarcator such as a bounding box obtained from a face detector. Moving on to the problem of recognizing animals in wildlife documentaries [4], with the current state-of-the-art, it is not feasible to train a sufficiently accurate animal detector, since the variety within the bounding boxes is too large. Acquiring these bounding boxes by hand is tedious.…”
Section: Introductionmentioning
confidence: 99%
“…Acquiring these bounding boxes by hand is tedious. Therefore, unlike [4], we are interested in a more realistic scenario where the bounding boxes are not available. In the absence of bounding boxes, the problem becomes much more challenging due to the following key issues -First, the presence of an animal is not known.…”
Section: Introductionmentioning
confidence: 99%
“…The dataset used in our experiments is that of (Dusart et al, 2013). This is a wildlife documentary named 'Great Wildlife Moments' 6 with subtitles from the BBC.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…The problem of aligning animals from videos with their mentions in subtitles has been studied in (Dusart et al, 2013) and (Venkitasubramanian et al, 2016). The former relies on hand-annotated bounding boxes to localize the animals in a frame, which are difficult to acquire.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation