The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 rich transcription evaluation

Moraru, Daniel; Meignier, Sylvain; Fredouille, Corinne; Besacier, Laurent; Bonastre, Jean-François

doi:10.1109/icassp.2004.1326000

Cited by 34 publications

(34 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A more integrated merging method is described in [49], while [35] describes a way of using the 2002 NIST speaker segmentation error metric to find regions in two inputs which agree and then uses these to train potentially more accurate speaker models. These systems generally produce performance gains, but tend to place some restriction on the systems being combined, such as the required architecture or equalizing the number of speakers.…”

Section: Combining Different Diarization Methodsmentioning

confidence: 99%

“…Several methods of combining aspects of different diarization systems have been tried, for example the "hybridization" or "piped" CLIPS/LIA systems of [35] and [49] and the "plug and play" CUED/MIT-LL system of [20] which both combine components of different systems together. A more integrated merging method is described in [49], while [35] describes a way of using the 2002 NIST speaker segmentation error metric to find regions in two inputs which agree and then uses these to train potentially more accurate speaker models.…”

Section: Combining Different Diarization Methodsmentioning

confidence: 99%

See 1 more Smart Citation

An overview of automatic speaker diarization systems

Tranter

Reynolds

2006

IEEE Trans. Audio Speech Lang. Process.

508

300

View full text Add to dashboard Cite

Abstract-Audio diarization is the process of annotating an input audio channel with information that attributes (possibly overlapping) temporal regions of signal energy to their specific sources. These sources can include particular speakers, music, background noise sources, and other signal source/channel characteristics. Diarization can be used for helping speech recognition, facilitating the searching and indexing of audio archives, and increasing the richness of automatic transcriptions, making them more readable. In this paper, we provide an overview of the approaches currently used in a key area of audio diarization, namely speaker diarization, and discuss their relative merits and limitations. Performances using the different techniques are compared within the framework of the speaker diarization task in the DARPA EARS Rich Transcription evaluations. We also look at how the techniques are being introduced into real broadcast news systems and their portability to other domains and tasks such as meetings and speaker verification.

show abstract

Section: Combining Different Diarization Methodsmentioning

confidence: 99%

Section: Combining Different Diarization Methodsmentioning

confidence: 99%

An overview of automatic speaker diarization systems

Tranter

Reynolds

2006

IEEE Trans. Audio Speech Lang. Process.

508

300

View full text Add to dashboard Cite

show abstract

“…As far as the ELISA piped system [7] is concerned, the two systems seem to be complementary. In theory, we could possibly pipe our segmentation using the Gaussian features to the HMM-based LIA system [7] and get clusters with lower DER. We could apply the same process to the non-Gaussianized system and get clusters with…”

Section: Discussionmentioning

confidence: 99%

“…Several methods of combining different diarization systems exist. One example is the piped system [7] [8] where the segmentation from the CLIPS system is piped to the LIA system for better initialization. Another example is the cluster voting scheme [9] that combines the clusters from two speaker diarization systems.…”

Section: Introductionmentioning

confidence: 99%

Multiple feature combination to improve speaker diarization of telephone conversations

Gupta

Kenny

Ouellet

et al. 2007

2007 IEEE Workshop on Automatic Speech Recognition &Amp; Understanding (ASRU)

View full text Add to dashboard Cite

We report results on speaker diarization of telephone conversations. This speaker diarization process is similar to the multistage segmentation and clustering system used in broadcast news. It consists of an initial acoustic change point detection algorithm, iterative Viterbi re-segmentation, gender labeling, agglomerative clustering using a Bayesian information criterion (BIC), followed by agglomerative clustering using stateof-the-art speaker identification methods (SID) and Viterbi resegmentation using Gaussian mixture models (GMMs). The Viterbi re-segmentation using GMMs is new, and it reduces the diarization error rate (DER) by 10%. We repeat these multistage segmentation and clustering steps twice: once with MFCCs as feature parameters for the GMMs used in gender labeling, SID and Viterbi re-segmentation steps, and another time with Gaussianized MFCCs as feature parameters for the GMMs used in these three steps. The resulting clusters from the parallel runs are combined in a novel way that leads to a significant reduction in the DER. On a development set containing 30 telephone conversations, this combination step reduced the DER by 20%. On another test set containing 30 telephone conversations, this step reduced the DER by 13%. The best error rate we have achieved is 6.7% on the development set, and 9.0% on the test set.

show abstract

“…This could be done by either assigning each utterance to multiple related clusters [30], or pre-segmenting utterances into small speakerhomogeneous regions and then clustering those regions. In parallel, speaker segmentation may be improved with the aid of speaker clustering [31]. Specifically, speech segments assigned to each cluster can be used to train a speaker-related model, thereby examining the speaker change boundaries of an audio recording in a manner of frame-by-frame recognition.…”

Section: Discussionmentioning

confidence: 99%

Speech utterance clustering based on the maximization of within-cluster homogeneity of speaker voice characteristics

Tsai

Wang

2006

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

This paper investigates the problem of how to partition unknown speech utterances into a set of clusters, such that each cluster consists of utterances from only one speaker, and the number of clusters reflects the unknown speaker population size. The proposed method begins by specifying a certain number of clusters, corresponding to one of the possible speaker population sizes, and then maximizes the level of overall within-cluster homogeneity of the speakers' voice characteristics.The within-cluster homogeneity is characterized by the likelihood probability that a cluster model, trained using all the utterances within a cluster, matches each of the within-cluster utterances. To attain the maximal sum of likelihood probabilities for all utterances, the proposed method applies a genetic algorithm to determine the cluster in which each utterance should be located. For greater computational efficiency, also proposed is a clustering criterion that approximates the likelihood probability with a divergence-based model similarity between a cluster and each of the withincluster utterances. The clustering method then examines various legitimate numbers of clusters by adapting the Bayesian information criterion to determine the most likely speaker population size. The experimental results show the superiority of the proposed method over conventional methods based on hierarchical clustering.

show abstract

The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 rich transcription evaluation

Cited by 34 publications

References 6 publications

An overview of automatic speaker diarization systems

An overview of automatic speaker diarization systems

Multiple feature combination to improve speaker diarization of telephone conversations

Speech utterance clustering based on the maximization of within-cluster homogeneity of speaker voice characteristics

Contact Info

Product

Resources

About