2012
DOI: 10.1109/tasl.2011.2159971
|View full text |Cite
|
Sign up to set email alerts
|

Speaker Diarization Features: The UPM Contribution to the RT09 Evaluation

Abstract: Abstract-Two new features have been proposed and used in the Rich Transcription Evaluation 2009 by the Universidad Politécnica de Madrid, which outperform the results of the baseline system. One of the features is the intensity channel contribution, a feature related to the location of the speaker. The second feature is the logarithm of the interpolated fundamental frequency. It is the first time that both features are applied to the clustering stage of múl-tiple distant microphone meetings diarization. It is … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
6
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 11 publications
(9 citation statements)
references
References 21 publications
0
6
0
Order By: Relevance
“…Most MDM systems use acoustic features as Mel-Frequency Cepstral Coefficients (MFCC) and localization features as the Time Delay Of Arrival (TDOA) values [1]. Other features used in some systems are the normalized energy of the channels [2] or the prosodic parameters [3] [4].…”
Section: Introductionmentioning
confidence: 99%
“…Most MDM systems use acoustic features as Mel-Frequency Cepstral Coefficients (MFCC) and localization features as the Time Delay Of Arrival (TDOA) values [1]. Other features used in some systems are the normalized energy of the channels [2] or the prosodic parameters [3] [4].…”
Section: Introductionmentioning
confidence: 99%
“…The system mush be able to robustly cope with noisy ASR-processed corpora and with challenging data such as interviews, debates, home recordings, political speeches, etc. The use of diarization techniques for speaker-turn segmentation will allow the system creating homogeneous voices from heterogeneous recordings, because the number of speakers would be automatically estimated in a fully unsupervised way, and language-independent diarization techniques automatically could provide the temporal labels of the turns of a certain speaker [2,3].…”
Section: Introductionmentioning
confidence: 99%
“…In the speech community, the Qualcomm-ICSI-OGI front end [21] is commonly used to perform Wiener filtering. Most state-of-the-art speaker diarization systems [10,11,15,22] applied Wiener filtering to all audio channels for speech enhancement before filtered and summed to produce a beamformed audio channel. In Van Leeuwen and Konecny [23], the filtering, however, was applied after beamforming.…”
Section: Wiener Filtermentioning
confidence: 99%
“…The IAHC was initially proposed by Ajmera and Wooters [38] for speaker clustering and since then, the approach has been adopted by many others [11,15,22,23,178]. For completeness, this section provides the detail description of the IAHC clustering procedures.…”
Section: Iterative Agglomerative Hierarchical Clustering Framework Fomentioning
confidence: 99%
See 1 more Smart Citation