Step-by-step and integrated approaches in broadcast news speaker diarization

Abstract-This paper presents a theoretical framework to analyze the relative merits of the two most general, dominant approaches to speaker diarization involving bottom-up and top-down hierarchical clustering. We present an original qualitative comparison which argues how the two approaches are likely to exhibit different behavior in speaker inventory optimization and model training: bottom-up approaches will capture comparatively purer models and will thus be more sensitive to nuisance variation such as that related to the speech content; top-down approaches, in contrast, will produce less discriminative speaker models but, importantly, models which are potentially better normalized against nuisance variation. We report experiments conducted on two standard, single-channel NIST RT evaluation datasets which validate our hypotheses. Results show that competitive performance can be achieved with both bottom-up and top-down approaches (average DERs of 21% and 22%), and that neither approach is superior. Speaker purification, which aims to improve speaker discrimination, gives more consistent improvements with the top-down system than with the bottom-up system (average DERs of 19% and 25%), thereby confirming that the top-down system is less discriminative and that the bottom-up system is less stable. Finally, we report a new combination strategy that exploits the merits of the two approaches. Combination delivers an average DER of 17% and confirms the intrinsic complementary of the two approaches.

show abstract

“…Previous work would seem to support this observation [24]. We report our recent work on system combination in Section IV-E.…”

Section: ) Discrimination and Purificationsupporting

confidence: 51%

“…A number of combination approaches have been proposed previously, at the clustering stage [24], [31] or at the output stage [32]- [34]. Better performance is usually obtained but, with the exception of [35], none of the previous work considers the combination of both bottom-up and top-down system outputs without further re-segmentation.…”

Section: E Combinationmentioning

confidence: 99%

A Comparative Study of Bottom-Up and Top-Down Approaches to Speaker Diarization

Evans

Bozonnet

Wang

et al. 2012

IEEE Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Those that apply segmentation to the MFCC stream, which might be uniform or based on the speaker change detection algorithms (see (Chen & Gopalakrishnam, 1998)), and those that do not apply such segmentation. Following the terminology of (Meignier et al, 2006) we will refer to the former branch as step-by-step algorithms, while to the latter as integrated algorithms. Both algorithmic approaches exploit a certain characteristic that the speaker labels exhibit, which is the temporal continuity.…”

Section: General Algorithmic Approachesmentioning

confidence: 99%

A Review of Recent Advances in Speaker Diarization with Bayesian Methods

Stafylakis¹,

Katsouros²

2011

Speech and Language Technologies

View full text Add to dashboard Cite

“…For FixSlidHAC pR, we first applied FixSlid with the threshold parameter pRange to segment the input audio stream, then we pruned non-speech regions within the audio segments and grouped the segments using HAC with multiple stages, which have been applied in state-of-the-art speaker diarization systems [8], [7], [35], [30]. As shown in Fig.…”

Section: B Experiments On Broadcast News Data 1) Data Set Descriptionmentioning

confidence: 99%

BIC-Based Speaker Segmentation Using Divide-and-Conquer Strategies With Application to Speaker Diarization

Cheng

Wang

2010

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Abstract-In this paper, we propose three divide-and-conquer approaches for BIC-based speaker segmentation. The approaches detect speaker changes by recursively partitioning a large analysis window into two sub-windows and recursively verifying the merging of two adjacent audio segments using ∆BIC, a widelyadopted distance measure of two audio segments. We compare our approaches to three popular distance-based approaches, namely, Chen and Gopalakrishnan's window-growing-based approach, Siegler et al.'s fixed-size sliding window approach, and Delacourt and Wellekens's DISTBIC approach, by performing computational cost analysis and conducting speaker change detection experiments on two broadcast news data sets. The results show that the proposed approaches are more efficient and achieve higher segmentation accuracy than the compared distance-based approaches. In addition, we apply the segmentation approaches discussed in this paper to the speaker diarization task. The experiment results show that a more effective segmentation approach leads to better diarization accuracy.

show abstract

Step-by-step and integrated approaches in broadcast news speaker diarization

Cited by 114 publications

References 22 publications

A Comparative Study of Bottom-Up and Top-Down Approaches to Speaker Diarization

A Comparative Study of Bottom-Up and Top-Down Approaches to Speaker Diarization

A Review of Recent Advances in Speaker Diarization with Bayesian Methods

BIC-Based Speaker Segmentation Using Divide-and-Conquer Strategies With Application to Speaker Diarization

Contact Info

Product

Resources

About