2009
DOI: 10.1016/j.imavis.2008.04.018
|View full text |Cite
|
Sign up to set email alerts
|

Taking the bite out of automated naming of characters in TV video

Abstract: We investigate the problem of automatically labelling appearances of characters in TV or film material with their names. This is tremendously challenging due to the huge variation in imaged appearance of each character and the weakness and ambiguity of available annotation. However, we demonstrate that high precision can be achieved by combining multiple sources of information, both visual and textual. The principal novelties that we introduce are: (i) automatic generation of time stamped character annotation … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
168
0
1

Year Published

2010
2010
2022
2022

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 164 publications
(181 citation statements)
references
References 35 publications
0
168
0
1
Order By: Relevance
“…For point tracking we use the KLT tracker [24] which uses optical flow to track sparse interest points for L frames, where L is a parameter. To determine whether two subsequent detection bounding boxes A and B belong to the same unique animal we use the intersection over union measure A∩B A∪B > 0.5 of the set of point tracks through A and through B [7].…”
Section: Animal Countingmentioning
confidence: 99%
“…For point tracking we use the KLT tracker [24] which uses optical flow to track sparse interest points for L frames, where L is a parameter. To determine whether two subsequent detection bounding boxes A and B belong to the same unique animal we use the intersection over union measure A∩B A∪B > 0.5 of the set of point tracks through A and through B [7].…”
Section: Animal Countingmentioning
confidence: 99%
“…In this section, an approach to face recognition based on facial feature localization from [21] is explained. This approach rst detects facial features and than extracts local appearance based descriptors at where the features were found.…”
Section: Face Recognitionmentioning
confidence: 99%
“…Everingham et al [26,27] addressed the problem of automatically labeling faces of characters in TV or film materials with their names. Similar to the "Faces in the News" labeling in [16], where detected frontal faces in news images are tagged with names appearing in the news story text, they proposed to combine visual cues (face and cloth) and textual cues (subtitle and transcript) for assigning names.…”
Section: Face Retrieval In Videomentioning
confidence: 99%
“…They align the transcripts with subtitles using dynamic time warping to obtain textual annotation, and use visual speaker detection to resolve the ambiguities, i.e., only associating names with face tracks where the face is detected as speaking. A nearest neighbor [26] or SVM [27] classifier, trained on labeled tracks, is used to classify the unlabeled face tracks. Their approach has demonstrated promising performance on three 40 minute episodes of a TV serial.…”
Section: Face Retrieval In Videomentioning
confidence: 99%