Speaker Diarization Error Analysis Using Oracle Components

Huijbregts, Marijn; Leeuwen, David A. van; Wooters, Chuck

doi:10.1109/tasl.2011.2162318

Cited by 14 publications

(14 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In standard speaker diarization systems, which are based on iterative segmenting and clustering [23,12], each speaker is modeled by a GMM model and the segmentation is done using HMM-Viterbi decoding. More specifically, the system starts with K clusters 3 after front-end acoustic processing and removing non-speech segments.…”

Section: Resultsmentioning

confidence: 99%

Automatic Signer Diarization - The Mover Is the Signer Approach

Gebre

Wittenburg

Heskes

2013

2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops

View full text Add to dashboard Cite

We present a vision-based method for signer diarization-the task of automatically determining "who signed when?" in a video. This task has similar motivations and applications as speaker diarization but has received little attention in the literature. In this paper, we motivate the problem and propose a method for solving it. The method is based on the hypothesis that signers make more movements than their interlocutors. Experiments on four videos (a total of 1.4 hours and each consisting of two signers) show the applicability of the method. The best diarization error rate (DER) obtained is 0.16.

show abstract

Section: Resultsmentioning

confidence: 99%

Automatic Signer Diarization - The Mover Is the Signer Approach

Gebre

Wittenburg

Heskes

2013

2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops

View full text Add to dashboard Cite

show abstract

“…Previous studies have shown that the error rates of automatic speech processing systems increase when processing speech from multiple simultaneous speakers [8], [3]. Several diagnostical studies on speaker diarization systems have also shown that overlapping speech is one of the main sources of error in state of the art speaker diarization systems [9], [10], [11]. Several previous works have proposed methods to detect overlapping speech in meeting room conversations.…”

Section: Introductionmentioning

confidence: 99%

Overlapping Speech Detection Using Long-Term Conversational Features for Speaker Diarization in Meeting Room Conversations

Yella

Bourlard

2014

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Abstract-Overlapping speech has been identified as one of the main sources of errors in diarization of meeting room conversations. Therefore, overlap detection has become an important step prior to speaker diarization. Studies on conversational analysis have shown that overlapping speech is more likely to occur at specific parts of a conversation. They have also shown that overlap occurrence is correlated with various conversational features such as speech, silence patterns and speaker turn changes. We use features capturing this higher level information from structure of a conversation such as silence and speaker change statistics to improve acoustic feature based classifier of overlapping and single-speaker speech classes. The silence and speaker change statistics are computed over a long-term window (around 3-4 seconds) and are used to predict the probability of overlap in the window. These estimates are then incorporated into a acoustic feature based classifier as prior probabilities of the classes. Experiments conducted on three corpora (AMI, NIST-RT and ICSI) have shown that the proposed method improves the performance of acoustic featurebased overlap detector on all the corpora. They also reveal that the model based on long-term conversational features used to estimate probability of overlap which is learned from AMI corpus generalizes to meetings from other corpora (NIST-RT and ICSI). Moreover, experiments on ICSI corpus reveal that the proposed method also improves laughter overlap detection. Consequently, applying overlap handling techniques to speaker diarization using the detected overlap results in reduction of diarization error rate (DER) on all the three corpora.

show abstract

“…Several diagnostical studies were done to isolate the main sources of errors in speaker diarization systems [11,12,13]. These studies have shown that the significant sources of errors in a typical diarization system come from overlapping speech segments and errors in speech/non-speech detection.…”

Section: Introductionmentioning

confidence: 99%

Information bottleneck based speaker diarization of meetings using non-speech as side information

Yella

Bourlard

2014

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Background noise and errors in speech/non-speech detection cause significant degradation to the output of a speaker diarization system. In a typical speaker diarization system, non-speech segments are excluded prior to unsupervised clustering. In the current study, we exploit the information present in the non-speech segments of a recording to improve the output of the speaker diarization system based on information bottleneck framework. This is achieved by providing information from non-speech segments as side (irrelevant) information to information bottleneck based clustering. Experiments on meeting recordings from RT 06, 07, 09, evaluation sets have shown that the proposed method decreases the diarization error rate by around 18% relative to the baseline speaker diarization system based on information bottleneck framework. Comparison with a state of the art system based on HMM/GMM framework shows that the proposed method significantly decreases the gap in performance between the information bottleneck system and HMM/GMM system.

show abstract

Speaker Diarization Error Analysis Using Oracle Components

Cited by 14 publications

References 16 publications

Automatic Signer Diarization - The Mover Is the Signer Approach

Automatic Signer Diarization - The Mover Is the Signer Approach

Overlapping Speech Detection Using Long-Term Conversational Features for Speaker Diarization in Meeting Room Conversations

Information bottleneck based speaker diarization of meetings using non-speech as side information

Contact Info

Product

Resources

About