2003
DOI: 10.1117/12.533037
|View full text |Cite
|
Sign up to set email alerts
|

<title>Discovery and fusion of salient multimodal features toward news story segmentation</title>

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
47
0
1

Year Published

2005
2005
2014
2014

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 27 publications
(48 citation statements)
references
References 11 publications
0
47
0
1
Order By: Relevance
“…For example, field-to-studio shot transition is a salient story boundary cue. This is because many broadcast news programs follow a clear pattern: each news story starts with a studio shot and then moves to field shots [8]. An anchor face is another visual feature indicating a topic transition [9].…”
Section: Introductionmentioning
confidence: 99%
“…For example, field-to-studio shot transition is a salient story boundary cue. This is because many broadcast news programs follow a clear pattern: each news story starts with a studio shot and then moves to field shots [8]. An anchor face is another visual feature indicating a topic transition [9].…”
Section: Introductionmentioning
confidence: 99%
“…Discriminative models with specifically designed feature representation (e.g., bag of features [113], fisher scores [66]) and a similarity metric (e.g., EarthMover's Distance [116], string kernels [84]) have also shown good detection performance in domains like computational biology and text classification. Discriminative models have also been used to model video events such as story segmentation [63] or short-term events [40], [150], [154] with promising results.…”
Section: )mentioning
confidence: 99%
“…A news story is Ba segment of a news broadcast with a coherent news focus which contains at least two independent, declarative clauses[ [139]. State-of-the-art detection algorithms achieve good segmentation results, with an F1 measure up to 0.74 [22], [27], [58], [63]. This is done by employing machine learning techniques such as SVM and HMM, along with judicious use of multimodal features such as shot length (production effect) or prosody in the anchor speech (content feature).…”
Section: ) Detecting Production Eventsmentioning
confidence: 99%
“…1(a). Multi-modal fusion for unsupervised learning differs from those for supervised learning [8] in that neither labeled ground-truth nor class separability is available as the computational criteria for guiding the fusion model. Therefore we use the data likelihood in generative models as an alternative criterion to optimize the multilevel dynamic mixture model.…”
Section: Layered Dynamic Mixture Modelmentioning
confidence: 99%
“…A story is defined [6] as a segment of a news broadcast with a coherent news focus which contains at least two independent, declarative clauses. Shot boundaries in news can be reliably detected with over 90% accuracy, while state-of-the-art audio-visual story segmentation has an F1 measure ∼ 75% [8].…”
Section: Processing Multi-modal Inputmentioning
confidence: 99%