2012
DOI: 10.1142/s1793351x12400077
|View full text |Cite
|
Sign up to set email alerts
|

Speech Shot Extraction From Broadcast News Videos

Abstract: We propose a method for discriminating between a speech shot and a narrated shot to extract genuine speech shots from a broadcast news video. Speech shots in news videos contain a wealth of multimedia information of the speaker, and could thus be considered valuable as archived material. In order to extract speech shots from news videos, there is an approach that uses the position and size of a face region. However, it is difficult to extract them with only such an approach, since news videos contain non-speec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2014
2014
2019
2019

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 18 publications
0
3
0
Order By: Relevance
“…To solve this problem, as a special case, we should consider using the speaker's original voice when the selected shot contains a monologue, instead. This could be detected by, for example, Kumagai et al's method [17]. For sentence #3 ( Fig.…”
Section: Resultsmentioning
confidence: 98%
See 1 more Smart Citation
“…To solve this problem, as a special case, we should consider using the speaker's original voice when the selected shot contains a monologue, instead. This could be detected by, for example, Kumagai et al's method [17]. For sentence #3 ( Fig.…”
Section: Resultsmentioning
confidence: 98%
“…Although in their work, it is shown that this approach is e®ective to some extent, if we do not consider the more high-level visual contents actually present in a scene, it will limit the cases that it could handle properly. Recently, Kumagai et al attempted to detect such inconsistency in news videos based on the relation between audio-visual features [17], but it could only handle monologue (speech) scenes.…”
Section: Introductionmentioning
confidence: 99%
“…Thus, we have developed a method that automatically learns and excludes voices of the anchorperson and the reporters according to specific keywords in the CC [12] and a method that learns the correlation of the features between the lip shape and the audio [13], in order to detect monologue scenes. See corresponding references for details of the works.…”
Section: Detection Of Monologue Scenesmentioning
confidence: 99%