Claude Barras scite author profile

This paper describes recent advances in speaker diarization with a multi-stage segmentation and clustering system, which incorporates a speaker identification step. This system builds upon the baseline audio partitioner used in the LIMSI broadcast news transcription system. The baseline partitioner provides a high cluster purity, but has a tendency to split data from speakers with a large quantity of data into several segment clusters. Several improvements to the baseline system have been made. First, the iterative Gaussian mixture model (GMM) clustering has been replaced by a Bayesian information criterion (BIC) agglomerative clustering. Second an additional clustering stage has been added, using a GMM-based speaker identification method. Finally a post-processing stage refines the segment boundaries using the output of a transcription system. On the NIST RT-04F and ESTER evaluation data, the multi-stage system reduces the speaker error by over 70% relative to the baseline system, and gives between 40% and 50% reduction relative to a single-stage BIC clustering system.

show abstract

Transcriber: Development and use of a tool for assisting speech corpora production

Barras

Geoffrois

Wu³

et al. 2001

Speech Communication

186

101

View full text Add to dashboard Cite

Speaker Change Detection in Broadcast TV Using Bidirectional Long Short-Term Memory Networks

Yin¹,

Bredin²,

Barras³

2017

View full text Add to dashboard Cite

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

show abstract

LSTM Based Similarity Measurement with Spectral Clustering for Speaker Diarization

Lin¹,

Yin²,

Li³

et al. 2019

View full text Add to dashboard Cite

More and more neural network approaches have achieved considerable improvement upon submodules of speaker diarization system, including speaker change detection and segment-wise speaker embedding extraction. Still, in the clustering stage, traditional algorithms like probabilistic linear discriminant analysis (PLDA) are widely used for scoring the similarity between two speech segments. In this paper, we propose a supervised method to measure the similarity matrix between all segments of an audio recording with sequential bidirectional long shortterm memory networks (Bi-LSTM). Spectral clustering is applied on top of the similarity matrix to further improve the performance. Experimental results show that our system significantly outperforms the state-of-the-art methods and achieves a diarization error rate of 6.63% on the NIST SRE 2000 CALL-HOME database.

show abstract

Feature and score normalization for speaker verification of cellular data

Barras

Gauvain

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Claude Barras

Multistage speaker diarization of broadcast news

Transcriber: Development and use of a tool for assisting speech corpora production

Speaker Change Detection in Broadcast TV Using Bidirectional Long Short-Term Memory Networks

LSTM Based Similarity Measurement with Spectral Clustering for Speaker Diarization

Feature and score normalization for speaker verification of cellular data

Contact Info

Product

Resources

About