2016
DOI: 10.1016/j.patrec.2016.01.025
|View full text |Cite
|
Sign up to set email alerts
|

Wildlife recognition in nature documentaries with weak supervision from subtitles and external data

Abstract: We propose a weakly supervised framework for domain adaptation in a multi-modal context for multi-label classification. This framework is applied to annotate objects such as animals in a target video with subtitles, in the absence of visual demarcators. We start from classifiers trained on external data (the source, in our setting -ImageNet), and iteratively adapt them to the target dataset using textual cues from the subtitles. Experiments on a challenging dataset of wildlife documentaries validate the framew… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
28
1

Year Published

2017
2017
2019
2019

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 7 publications
(29 citation statements)
references
References 25 publications
0
28
1
Order By: Relevance
“…The problem of aligning animals from videos with their mentions in subtitles has been studied in (Dusart et al, 2013) and (Venkitasubramanian et al, 2016). The former relies on hand-annotated bounding boxes to localize the animals in a frame, which are difficult to acquire.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The problem of aligning animals from videos with their mentions in subtitles has been studied in (Dusart et al, 2013) and (Venkitasubramanian et al, 2016). The former relies on hand-annotated bounding boxes to localize the animals in a frame, which are difficult to acquire.…”
Section: Related Workmentioning
confidence: 99%
“…Discretization can also be applied to the global representation used by the SVM, but as shown in(Venkitasubramanian et al, 2016), it is particularly useful in conjunction with a naive Bayes classifier.…”
mentioning
confidence: 99%
“…For the huge wildlife monitoring images with invalid images included, traditional method such as manual sorting is disadvantageous in terms of high labor intensity, low efficiency and unstable the recognition accuracy. Therefore automatic wildlife recognition [22] is demanded to improve the efficiency of wildlife monitoring. In recent years, various novel models such as AlexNet [23], VGG Net [24], Google Net [25], Deep Residual Net [26] and Dense Net [27] were proposed to improve wildlife recognition capability and achieve the automatic recognition of wildlife accurately.…”
Section: Introductionmentioning
confidence: 99%
“…Particularly in videos, language and vision are complementary to each other and it is essential to look at them in unison. Vision tasks often benefit from the associated text [42] while Natural Language Processing (NLP) tasks benefit from the vision. As an example, consider Figure 1.…”
Section: Introductionmentioning
confidence: 99%