Parallelizing Speaker-Attributed Speech Recognition for Meeting Browsing

Friedland, Gerald; Chong, Jike; Janin, Adam

doi:10.1109/ism.2010.26

Cited by 7 publications

(4 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Once they are computed all processing tasks take place in the binary domain. Other works in speaker diarization concerned with speed include [28], [29] which achieve faster than real-time processing through the use of several processing tricks applied to a standard bottom-up approach ( [28]) or by parallelizing most of the processing in a GPU unit ( [29]). The need for efficient diarization systems is emphasized when processing very large databases or when using diarization as a preprocessing step to other speech algorithms.…”

Section: ) Bottom-up Approachmentioning

confidence: 99%

Speaker Diarization: A Review of Recent Research

Anguera

Bozonnet²,

Evans³

et al. 2012

IEEE Trans. Audio Speech Lang. Process.

565

349

View full text Add to dashboard Cite

Abstract-Speaker diarization is the task of determining "who spoke when?" in an audio or video recording that contains an unknown amount of speech and also an unknown number of speakers. Initially, it was proposed as a research topic related to automatic speech recognition, where speaker diarization serves as an upstream processing step. Over recent years, however, speaker diarization has become an important key technology for many tasks, such as navigation, retrieval, or higher-level inference on audio data. Accordingly, many important improvements in accuracy and robustness have been reported in journals and conferences in the area. The application domains, from broadcast news, to lectures and meetings, vary greatly and pose different problems, such as having access to multiple microphones and multimodal information or overlapping speech. The most recent review of existing technology dates back to 2006 and focuses on the broadcast news domain. In this paper we review the current state-of-the-art, focusing on research developed since 2006 that relates predominantly to speaker diarization for conference meetings. Finally, we present an analysis of speaker diarization performance as reported through the NIST Rich Transcription evaluations on meeting data and identify important areas for future research.

show abstract

Section: ) Bottom-up Approachmentioning

confidence: 99%

Speaker Diarization: A Review of Recent Research

Anguera

Bozonnet²,

Evans³

et al. 2012

IEEE Trans. Audio Speech Lang. Process.

565

349

View full text Add to dashboard Cite

show abstract

“…(1) where ˛is the filter coefficient in the range [0.95, 0.98] [17] , the pre-emphasized signal is then windowed using Hanning window(s) to improve the spectral representation of the speech vector [18] . Once the speech signal has been windowed and pre-emphasized, the Fast Fourier Transform (FFT) is calculated [15] .…”

Section: Speaker Recondition Systemmentioning

confidence: 99%

Performance of Text-Independent Automatic Speaker Recognition on a Multicore System

Kouatly,

Khan

2024

Tsinghua Sci. Technol.

View full text Add to dashboard Cite

This paper studies a high-speed text-independent Automatic Speaker Recognition (ASR) algorithm based on a multicore system's Gaussian Mixture Model (GMM). The high speech is achieved using parallel implementation of the feature's extraction and aggregation methods during training and testing procedures. Shared memory parallel programming techniques using both OpenMP and PThreads libraries are developed to accelerate the code and improve the performance of the ASR algorithm. The experimental results show speed-up improvements of around 3.2 on a personal laptop with Intel i5-6300HQ (2.3 GHz, four cores without hyper-threading, and 8 GB of RAM). In addition, a remarkable 100% speaker recognition accuracy is achieved.

show abstract

“…A previous analysis of the diarization engine [28] showed that it is subject to two main computational bottlenecks: the training of the Gaussian Mixture Models, mostly during the merging phase that requires n 2 comparisons to determine the cluster pair to merge [20], and the Viterbi alignment. In prior work [19], it was shown that for the engine used here, Viterbi alignment can be replaced by a local majority vote without a significant change in accuracy.…”

Section: Gmm Training On a Gpumentioning

confidence: 99%

Fast speaker diarization using a high-level scripting language

Gonina

Friedland

Cook

et al. 2011

2011 IEEE Workshop on Automatic Speech Recognition &Amp; Understanding

Self Cite

View full text Add to dashboard Cite

Abstract-Current state-of-the-art speaker diarization systems use agglomerative clustering of Gaussian Mixture Models (GMMs) to determine the number of speakers in an audio recording. GMM training is a central computation in the agglomerative clustering approach, which presents computational challenges that limit performance and make real-time processing of audio very difficult. With the emergence of highly parallel multicore and manycore processors such as Graphics Processing Units (GPUs), we can re-implement GMM training for these processors to achieve faster than real-time performance by taking advantage of parallelism in the training computation. However, developing and maintaining the low-level GPU code is difficult and requires deep understanding of hardware architecture of the parallel processor. In this paper we present a speaker diarization application captured in under 50 lines of Python that achieves 50-200× faster than real-time performance by automatically executing computationally intensive GMM training on an NVIDIA GPU with no significant loss in accuracy.

show abstract

Parallelizing Speaker-Attributed Speech Recognition for Meeting Browsing

Cited by 7 publications

References 15 publications

Speaker Diarization: A Review of Recent Research

Speaker Diarization: A Review of Recent Research

Performance of Text-Independent Automatic Speaker Recognition on a Multicore System

Fast speaker diarization using a high-level scripting language

Contact Info

Product

Resources

About