2010 IEEE International Workshop on Multimedia Signal Processing 2010
DOI: 10.1109/mmsp.2010.5662015
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised detection of multimodal clusters in edited recordings

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
12
0

Year Published

2012
2012
2014
2014

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(12 citation statements)
references
References 19 publications
0
12
0
Order By: Relevance
“…Note that, this paper differs from [11] in two major ways. First of all, we propose a method that does not need initial partitioning of the audio and visual data.…”
Section: Introductionmentioning
confidence: 90%
See 3 more Smart Citations
“…Note that, this paper differs from [11] in two major ways. First of all, we propose a method that does not need initial partitioning of the audio and visual data.…”
Section: Introductionmentioning
confidence: 90%
“…This idea is introduced in [12] where the single most consistent pair of clusters is selected according to some heuristics on the pattern of occurrence of structurally relevant events. A rather similar philosophy is used in [11] to select pairs of segments assuming each segment is labeled. The key difference is that in [11] a unique segmentation is used in each modality, with cluster labels attached to segments, rather than a nested hierarchy of clusters.…”
Section: Overviewmentioning
confidence: 99%
See 2 more Smart Citations
“…However, the association between speech and face can introduce many ambiguities in case of multi-face shots, as shown in the first image of Related work. Earlier work on AV person diarization performs separately audio and video clustering in a first step and associate the clusters in a second step [2,3,4]. The most simple clue to associate faces and speakers is their temporal co-occurrence.…”
Section: Introductionmentioning
confidence: 99%