Speaker Diarization Features: The UPM Contribution to the RT09 Evaluation

Pardo, José Manuel; Barra-Chicote, Roberto; San-Segundo, Rubén; Córdoba, Ricardo de; Martínez-González, Beatriz

doi:10.1109/tasl.2011.2159971

Cited by 11 publications

(9 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most MDM systems use acoustic features as Mel-Frequency Cepstral Coefficients (MFCC) and localization features as the Time Delay Of Arrival (TDOA) values [1]. Other features used in some systems are the normalized energy of the channels [2] or the prosodic parameters [3] [4].…”

Section: Introductionmentioning

confidence: 99%

Selection of TDOA parameters for MDM speaker diarization

Martínez-González¹,

Pardo²,

Echeverry-Correa³

et al. 2012

Interspeech 2012

Self Cite

View full text Add to dashboard Cite

Several methods to improve multiple distant microphone (MDM) speaker diarization based on Time Delay of Arrival (TDOA) features are evaluated in this paper. All of them avoid the use of a single reference channel to calculate the TDOA values and, based on different criteria, select among all possible pairs of microphones a set of pairs that will be used to estimate the TDOA's. The evaluated methods have been named the "Dynamic Margin" (DM), the "Extreme Regions" (ER), the "Most Common" (MC), the "Cross Correlation" (XCorr) and the "Principle Component Analysis" (PCA). It is shown that all methods improve the baseline results for the development set and four of them improve also the results for the evaluation set. Improvements of 3.49% and 10.77% DER relative are obtained for DM and ER respectively for the test set. The XCorr and PCA methods achieve an improvement of 36.72% and 30.82% DER relative for the test set. Moreover, the computational cost for the XCorr method is 20% less than the baseline.

show abstract

Section: Introductionmentioning

confidence: 99%

Selection of TDOA parameters for MDM speaker diarization

Martínez-González¹,

Pardo²,

Echeverry-Correa³

et al. 2012

Interspeech 2012

Self Cite

View full text Add to dashboard Cite

show abstract

“…The system mush be able to robustly cope with noisy ASR-processed corpora and with challenging data such as interviews, debates, home recordings, political speeches, etc. The use of diarization techniques for speaker-turn segmentation will allow the system creating homogeneous voices from heterogeneous recordings, because the number of speakers would be automatically estimated in a fully unsupervised way, and language-independent diarization techniques automatically could provide the temporal labels of the turns of a certain speaker [2,3].…”

Section: Introductionmentioning

confidence: 99%

Towards an unsupervised speaking style voice building framework: multi.style speaker diarization

Lorenzo-Trueba¹,

Martínez-González²,

Barra-Chicote³

et al. 2012

Interspeech 2012

Self Cite

View full text Add to dashboard Cite

“…In the speech community, the Qualcomm-ICSI-OGI front end [21] is commonly used to perform Wiener filtering. Most state-of-the-art speaker diarization systems [10,11,15,22] applied Wiener filtering to all audio channels for speech enhancement before filtered and summed to produce a beamformed audio channel. In Van Leeuwen and Konecny [23], the filtering, however, was applied after beamforming.…”

Section: Wiener Filtermentioning

confidence: 99%

“…The IAHC was initially proposed by Ajmera and Wooters [38] for speaker clustering and since then, the approach has been adopted by many others [11,15,22,23,178]. For completeness, this section provides the detail description of the IAHC clustering procedures.…”

Section: Iterative Agglomerative Hierarchical Clustering Framework Fomentioning

confidence: 99%

“…The speech frames obtained from the output of the SAD module are uniformly grouped into K 0 clusters, where the value of K 0 is empirically determined from development data set, typically K 0 is from 15-20. Despite being such a naive strategy, many systems[12,15,22,66,78,150] have adopted this method and reported competitive results. Another variation of uniform initialization was suggested in Sun et al[10], where the segments from the SAD output were used.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Speaker diarization in meetings domain

Nguyen¹

View full text Add to dashboard Cite

The purpose of this study is to develop robust techniques for speaker segmentation and clustering with focus on meetings domain. The techniques examined can however be applied to any other domains such as telephone and broadcast news. Traditional techniques for speaker diarization developed for telephone conversations or broadcast news are based on a single channel, which is notably different from meetings domain which can have multiple channels. These techniques when adapted to meetings domain however perform poorer than expected since they do not exploit direction of arrival information, which is available in many meeting rooms with the presence of multiple microphones. Moreover, many of these techniques are involved with tunable parameters, which are presumably derived using external data. These parameters need to be individually adjusted for each data set accordingly to obtain reasonable performance. In this thesis, the focus is on robust and accurate speaker diarization techniques in meetings. I also want to thank my colleagues Ma Bin and Kong Aik for their understanding and support at work to let me spend time pursuing my own research. Also to my friends Wang Lei and Xiao Xiong whose discussions though not many but are really enjoyable and helpful. To Hanwu and Tin Lay who contributed to the research platform which I am tremendously benefited from. Lastly, this work could not have been achieved without the love and support of my family, my dad who instills into me the persistence, my mum who cares for me anytime and my fiancee who loves me and brightens up my life.

show abstract

Speaker Diarization Features: The UPM Contribution to the RT09 Evaluation

Cited by 11 publications

References 21 publications

Selection of TDOA parameters for MDM speaker diarization

Selection of TDOA parameters for MDM speaker diarization

Towards an unsupervised speaking style voice building framework: multi.style speaker diarization

Speaker diarization in meetings domain

Contact Info

Product

Resources

About