Technical Improvements of the E-HMM Based Speaker Diarization System for Meeting Records

Fredouille, Corinne; Senay, Grégory

doi:10.1007/11965152_32

Cited by 18 publications

(11 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Model-based approaches tend to have better performances and rely on a two-class detector, with models pre-trained with external speech and non-speech data [6], [41], [49], [51], [52]. Speech and non-speech models may optionally be adapted to specific meeting conditions [15].…”

Section: B Speech Activity Detectionmentioning

confidence: 99%

Speaker Diarization: A Review of Recent Research

Anguera

Bozonnet²,

Evans³

et al. 2012

IEEE Trans. Audio Speech Lang. Process.

563

349

View full text Add to dashboard Cite

Abstract-Speaker diarization is the task of determining "who spoke when?" in an audio or video recording that contains an unknown amount of speech and also an unknown number of speakers. Initially, it was proposed as a research topic related to automatic speech recognition, where speaker diarization serves as an upstream processing step. Over recent years, however, speaker diarization has become an important key technology for many tasks, such as navigation, retrieval, or higher-level inference on audio data. Accordingly, many important improvements in accuracy and robustness have been reported in journals and conferences in the area. The application domains, from broadcast news, to lectures and meetings, vary greatly and pose different problems, such as having access to multiple microphones and multimodal information or overlapping speech. The most recent review of existing technology dates back to 2006 and focuses on the broadcast news domain. In this paper we review the current state-of-the-art, focusing on research developed since 2006 that relates predominantly to speaker diarization for conference meetings. Finally, we present an analysis of speaker diarization performance as reported through the NIST Rich Transcription evaluations on meeting data and identify important areas for future research.

show abstract

Section: B Speech Activity Detectionmentioning

confidence: 99%

Speaker Diarization: A Review of Recent Research

Anguera

Bozonnet²,

Evans³

et al. 2012

IEEE Trans. Audio Speech Lang. Process.

563

349

View full text Add to dashboard Cite

show abstract

“…Gauvain et al [1998] and Sinha et al [2005] represented each fixed length window with a Gaussian distribution and measured the distance between those windows using divergence. The clustering step is achieved by a bottom-up fashion, that is, agglomerative clustering [Gauvain et al 1998;Ajmera and Wooters 2003;Reynolds and Torres-Carrasquillo 2004;Ben et al 2004], or a top-down manner, that is, divisive clustering (e.g., evolutive hidden Markov models in Fredouille and Senay [2006]). …”

Section: Related Workmentioning

confidence: 99%

Identification of Soundbite and Its Speaker Name Using Transcripts of Broadcast News Speech

Liu

2010

ACM Transactions on Asian Language Information Processing

View full text Add to dashboard Cite

This article presents a pipeline framework for identifying soundbite and its speaker name from Mandarin broadcast news transcripts. Both of the two modules, soundbite segment detection and soundbite speaker name recognition, are based on a supervised classification approach using multiple linguistic features. We systematically evaluated performance for each module as well as the entire system, and investigated the effect of using speech recognition (ASR) output and automatic sentence segmentation. We found that both of the two components impact the pipeline system, with more degradation in the entire system performance due to automatic speaker name recognition errors than soundbite segment detection. In addition, our experimental results show that using ASR output degrades the system performance significantly, and that using automatic sentence segmentation greatly impacts soundbite detection, but has much less effect on speaker name recognition. ACM Reference Format:Liu, F. and Liu, Y. 2010. Identification of soundbite and its speaker name using transcripts of broadcast news speech.

show abstract

“…Mostly metric-based segmentation and clustering is being used [Cas04,vL06, [Cas04] have been used for segmentation or clustering. A hierarchical top-down clustering approach was taken by [FS07] using an HMM adding new states (representing speakers) at each iteration while re-segmenting the speech data. In [ZBLG07, AWP07] a bottom-up approach was taken, also using an HMM to realign the data after each iteration.…”

Section: Assessment Of Speaker Diarization Systemsmentioning

confidence: 99%

“…Some systems just picked a single channel [vL06,ZBLG07], while others segmented each channel separately [JLSW04] before combining the results or performed some form of pre-processing [FS07,AWP07] to combine the channels into one single recording.…”

Section: Nist Benchmark Series For Speaker Diarizationmentioning

confidence: 99%

Quality of service modeling and analysis for carrier ethernet

Huijbregts¹

View full text Add to dashboard Cite

Technical Improvements of the E-HMM Based Speaker Diarization System for Meeting Records

Cited by 18 publications

References 9 publications

Speaker Diarization: A Review of Recent Research

Speaker Diarization: A Review of Recent Research

Identification of Soundbite and Its Speaker Name Using Transcripts of Broadcast News Speech

Quality of service modeling and analysis for carrier ethernet

Contact Info

Product

Resources

About