C. Woofers scite author profile

One of the sub-tasks of the Spring 2004 and Spring 2005NIST Meetings evaluations requires segmenting multi-party meetings into speaker-homogeneous regions using data from multiple distant microphones (the "MDM" sub-task). One approach to this task is to run a speaker segmentation system on each of the microphone channels separately, and then merge the results. This can be thought of as a many-to-one post-processing approach. In this paper we propose an alternative approach in which we use delay-andsum beamforming techniques to fuse the signals from each of the multiple distant microphones into a single enhanced signal. This approach can be thought of a many-to-one preprocessing approach. In the pre-processing approach we propose, the time delay of arrival (TDOA) between each of the multiple distant channels and a reference channel is computed incrementally using a window that steps through the signals from each of the multiple microphones. No information about the locations or setup of the microphones is required. Using the TDOA information, the channels are first aligned and then summed and the resulting "enhanced" signal is clustered using our standard speaker diarization system. We test our approach on the 2004 and 2005 NIST meetings evaluation databases and show that the technique performs very well.

show abstract

Automatic Weighting for the Combination of TDOA and Acoustic Features in Speaker Diarization for Meetings

Anguera

Woofers

Pardo

et al. 2007

View full text Add to dashboard Cite

In the task of speaker diarization for meetings it has been shown in previous work that it is useful to use the Time Delay of Arrival (TDOA) between the different audio channels in the meeting room as an extra source of information in addition to the acoustic features. When combining feature streams, we use a weight to control the relative contributions of the streams. In the past, this weight was determined using development data and the same weight value was applied to all meetings. In this paper we present a method for automatically determining the weight. A metric derived from the Bayesian Information Criterion (BIC) computed for each feature stream estimates the weight for each meeting on the initial clustering iteration and adapts its value throughout the diarization process. By using this technique we achieve a more robust system and up to 18.2% relative improvement over the method of tuning the weight on development data.

show abstract

Model Complexity Selection and Cross-Validation EM Training for Robust Speaker Diarization

Anguera

Shinozaki

Woofers

et al. 2007

View full text Add to dashboard Cite

Accurate modeling of speaker clusters is important in the task of speaker diarization. Creating accurate models involves both selection of the model complexity and optimum training given the data. Using models with fixed complexity and trained using the standard EM algorithm poses a risk of overfitting, which can lead to a reduction in diarization performance. In this paper a technique proposed by the author to estimate the complexity of a model is combined with a novel training algorithm called "Cross-Validation EM" to control the number of training iterations. This combination leads to more robust speaker modeling and results in an increase in speaker diarization performance. Tests on the NIST RT (MDM) datasets for meetings show a relative improvement of 10.6% relative on the test set.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

C. Woofers

Speech segmentation and spoken document processing

Speaker diarization for multi-party meetings using acoustic fusion

Automatic Weighting for the Combination of TDOA and Acoustic Features in Speaker Diarization for Meetings

Model Complexity Selection and Cross-Validation EM Training for Robust Speaker Diarization

Contact Info

Product

Resources

About