2012
DOI: 10.1109/tasl.2011.2159710
|View full text |Cite
|
Sign up to set email alerts
|

A Comparative Study of Bottom-Up and Top-Down Approaches to Speaker Diarization

Abstract: Abstract-This paper presents a theoretical framework to analyze the relative merits of the two most general, dominant approaches to speaker diarization involving bottom-up and top-down hierarchical clustering. We present an original qualitative comparison which argues how the two approaches are likely to exhibit different behavior in speaker inventory optimization and model training: bottom-up approaches will capture comparatively purer models and will thus be more sensitive to nuisance variation such as that … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
26
0
1

Year Published

2014
2014
2023
2023

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 37 publications
(27 citation statements)
references
References 26 publications
(40 reference statements)
0
26
0
1
Order By: Relevance
“…The top-down approach is reported to give worse performance on the NIST RT database [25] and has thus received less attention. However, paper [40] makes a thorough comparative study of these two approaches and demonstrates that these two approaches have similar performance.…”
Section: Top-down Approachmentioning
confidence: 99%
See 1 more Smart Citation
“…The top-down approach is reported to give worse performance on the NIST RT database [25] and has thus received less attention. However, paper [40] makes a thorough comparative study of these two approaches and demonstrates that these two approaches have similar performance.…”
Section: Top-down Approachmentioning
confidence: 99%
“…Although random initialization works well in most cases, LCM and VB systems tend to assign the segments to each speaker evenly in the case where a single speaker dominates the whole conversation, leading to poor results. According to the comparative study [40], we know that the bottom-up approach will capture comparatively purer models. Therefore, we recommend an informative AHC initialization method, similar to our previous paper [51].…”
Section: Ahc Initializationmentioning
confidence: 99%
“…Speaker diarization [1][2][3] is an unsupervised statistical pattern recognition task which aims to determine 'who spoke when' in a given audio stream. Speaker diarization has become a key, enabling technology in a wide variety of tasks including document processing, structuring and navigation, information retrieval, meta-data extraction and copyright detection.…”
Section: Introductionmentioning
confidence: 99%
“…Historically, the state-of-the art in speaker diarization for meetings has evolved around the implementation of offline systems, such as bottom-up and top-down hierarchical clustering approaches [3][4][5]. In both cases, speakers are modelled with Gaussian mixture models (GMMs) which are interconnected to form an ergodic hidden Markov model (HMM) in which the transitions represent speaker turns.…”
Section: Introductionmentioning
confidence: 99%
“…As for speaker diarization, many research works are based on agglomerative and divisive hierachical manner such as top-down or bottom-up algorithms [2]. The bottom-up approach is by far the most popular system, that is, hierachical agglomerative clustering (HAC).…”
Section: Introductionmentioning
confidence: 99%