2006
DOI: 10.1007/11788034_16
|View full text |Cite
|
Sign up to set email alerts
|

Annotating News Video with Locations

Abstract: Abstract. The location of video scenes is an important semantic descriptor especially for broadcast news video. In this paper, we propose a learning-based approach to annotate shots of news video with locations extracted from video transcript, based on features from multiple video modalities including syntactic structure of transcript sentences, speaker identity, temporal video structure, and so on. Machine learning algorithms are adopted to combine multi-modal features to solve two sub-problems: (1) whether t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2008
2008
2012
2012

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(10 citation statements)
references
References 8 publications
(10 reference statements)
0
10
0
Order By: Relevance
“…The task has to address different levels of semantic structure, from the frame, through shot and scene to the film or video stream as a complete entity, together with other semantically coherent sequences in the form of clips, episodes and news stories [2: pp. 10,35,36]. In the absence of exhaustive shot lists the minimalist nature of synopses in standard sources of reference makes them particularly blunt instruments for leveraging the full semantic content of these forms of information object.…”
Section: Image Indexingmentioning
confidence: 99%
“…The task has to address different levels of semantic structure, from the frame, through shot and scene to the film or video stream as a complete entity, together with other semantically coherent sequences in the form of clips, episodes and news stories [2: pp. 10,35,36]. In the absence of exhaustive shot lists the minimalist nature of synopses in standard sources of reference makes them particularly blunt instruments for leveraging the full semantic content of these forms of information object.…”
Section: Image Indexingmentioning
confidence: 99%
“…If concepts such as 'water', 'sky', 'cars', 'faces' or 'outdoors' are relatively well-detected, concepts such as 'entertainment' or aspects such as preferences or moods Hanjalic (2006) are far from being correctly identified. To cope with the automatic concept detection challenges, the multimedia communities are currently establishing concept lexicons Naphade et al (2006); Snoek, Worring, van Gemert, Geusebroek & Smeulders (2006), focusing on the concepts that are feasible for automatic detection Hauptmann et al (2007);Snoek, Worring, Geusebroek, Koelma, Seinstra & Smeulders (2006); Yang & Hauptmann (2006 Deselaers et al (2004);Jiang et al (2007)? As an answer to such a question, one may expect a set of descriptors that guarantees up to some degree of confidence that an effective retrieval system can be built on top of it.…”
Section: Features and Descriptorsmentioning
confidence: 99%
“…The film or video stream as a complete entity is an image composite constructed from one or more scenes, although other contributory structures-"clips" and "episodes"-may be present (Del Bimbo, 1999, p. 10). In the particular case of news video footage-an application upon which heavy emphasis has been placed by the research community-the complete entity consists of a series of stories, where each story is a semantically coherent video sequence on a specific news event (Yang & Hauptmann, 2006). Typically, the semantic content of the film or video stream as a complete entity is represented as a synopsis, outline, or abstract.…”
Section: Generic Locationmentioning
confidence: 99%
“…The problem has been investigated by Yang and Hauptmann (2006) with a view to satisfying such queries as "Find the scenes showing the flood in California caused by El Nino." Although news video may have transcripts from closed-captions or ASR in which reference is made to most of the locations shown in the footage, any given geographic location can have numerous visually different scenes, making determination of location from the visual content of a shot highly problematic.…”
Section: The Cbir Paradigm-moving Imagesmentioning
confidence: 99%