Abstract-When performing speaker diarization on recordings from meetings, multiple microphones of different qualities are usually available and distributed around the meeting room. Although several approaches have been proposed in recent years to take advantage of multiple microphones, they are either too computationally expensive and not easily scalable or they can not outperform the simpler case of using the best single microphone. In this work the use of classic acoustic beamforming techniques is proposed together with several novel algorithms to create a complete frontend for speaker diarization in the meeting room domain. New techniques we are present include blind reference-channel selection, two-step Time Delay of Arrival (TDOA) Viterbi postprocessing, and a dynamic output signal weighting algorithm, together with using such TDOA values in the diarization to complement the acoustic information. Tests on speaker diarization show a 25% relative improvement on the test set compared to using a single most centrally located microphone. Additional experimental results show improvements using these techniques in a speech recognition task.Index Terms-acoustic beamforming, speaker diarization, speaker segmentation and clustering, meetings processing.
In this paper, we present a novel speaker segmentation and clustering algorithm. The algorithm automatically performs both speaker segmentation and clustering without any prior knowledge of the identities or the number of speakers. Advantages of this algorithm over other approaches are: no need for training/development data, no threshold adjustment requirements, and robustness to different data conditions. This paper also reports the performance of the algorithm on different datasets released by NIST with different initial conditions and parameter settings. The consistently low speaker diarization error rate clearly indicates the robustness of the algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.