2011
DOI: 10.1109/tasl.2010.2048603
|View full text |Cite
|
Sign up to set email alerts
|

An Information Theoretic Combination of MFCC and TDOA Features for Speaker Diarization

Abstract: This correspondence describes a novel system for speaker diarization of meetings recordings based on the combination of acoustic features (MFCC) and Time Delay of Arrivals (TDOA). The first part of the paper analyzes differences between MFCC and TDOA features which possess completely different statistical properties. When Gaussian Mixture Models are used, experiments reveal that the diarization system is sensitive to the different recording scenarios (i.e. meeting rooms with varying number of microphones). In … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
24
0

Year Published

2012
2012
2022
2022

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 31 publications
(25 citation statements)
references
References 14 publications
1
24
0
Order By: Relevance
“…As previously pointed in [11], the IB system appears more robust to variation of weights across different meeting recordings. This robustness holds also in case of all-pairs TDOA feature vectors.…”
Section: Methodssupporting
confidence: 65%
See 2 more Smart Citations
“…As previously pointed in [11], the IB system appears more robust to variation of weights across different meeting recordings. This robustness holds also in case of all-pairs TDOA feature vectors.…”
Section: Methodssupporting
confidence: 65%
“…Whenever multiple features are available, the combination is performed in the space of relevance variables Y [11]. Separate GMMs with the same number of components are trained for each feature stream.…”
Section: Information Bottleneck Diarizationmentioning
confidence: 99%
See 1 more Smart Citation
“…However, most of the existing methods assume that the microphone location is given to estimate the direction of arrival of speakers [3]- [6]. Some methods using Time Difference Of Arrival (TDOA) have been proposed [7]- [9], which do not assume the known microphone location. These methods propose using HMM for speaker segmentation and clustering, as well as hierarchical agglomerative clustering using spacial information.…”
Section: Introductionmentioning
confidence: 99%
“…These methods propose using HMM for speaker segmentation and clustering, as well as hierarchical agglomerative clustering using spacial information. However, the methods have difficulty with overlapping speech [7] and estimating the number of speakers deterministically [8], [9].…”
Section: Introductionmentioning
confidence: 99%