Comparison of the impact of some Minkowski metrics on VQ/GMM based speaker recognition

Hanilçi, Cemal; Ertaş, Figen

doi:10.1016/j.compeleceng.2010.08.001

Cited by 11 publications

(7 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the same reason, since GMM is based on a different objective function, the differences between GMM and VQ type of models tend to be generally larger. In two recent independent studies [29,6], differences between these two clustering models were reported for different distance functions [29] and in SVM back-end setting [6]. We conclude that training methodology and data selection for UBM [30] are worth readdressing.…”

Section: Discussionmentioning

confidence: 99%

“…Two recent studies include more detailed comparisons of GMM and VQ [46,29]. In [46] the MAP trained VQ outperformed MAP-trained GMM for longer training data (2.5 minutes) but the situation was reversed for 10-second speech samples.…”

Section: Review Of Clustering Methods In Speaker Recognitionmentioning

confidence: 99%

“…In [46] the MAP trained VQ outperformed MAP-trained GMM for longer training data (2.5 minutes) but the situation was reversed for 10-second speech samples. The study of [29] focused on the choice of dissimilarity measure (city-block, euclidean, Chebychev) in VQ and two different clustering initializations (binary LBG splitting [52] versus random selection). Differences in the identification and verification tasks, as well as ML versus MAP training were also considered.…”

Section: Review Of Clustering Methods In Speaker Recognitionmentioning

confidence: 99%

“…The existing comparisons in speaker recognition study only a few methods, use different features and datasets preventing meaningful cross-comparisons. Even in [29,46], only the basic EM and K-means algorithms were studied. Thus, extensive comparison of better clustering algorithms is still missing.…”

Section: Research Objectives and Hypothesesmentioning

confidence: 99%

See 3 more Smart Citations

Comparison of clustering methods: A case study of text-independent speaker modeling

Kinnunen¹,

Sidoroff²,

Tuononen³

et al. 2011

Pattern Recognition Letters

View full text Add to dashboard Cite

Clustering is needed in various applications such as biometric person authentication, speech coding and recognition, image compression and information retrieval. Hundreds of clustering methods have been proposed for the task in various fields but, surprisingly, there are few extensive studies actually comparing them. An important question is how much the choice of a clustering method matters for the final pattern recognition application. Our goal is to provide a thorough experimental comparison of clustering methods for text-independent speaker verification. We consider parametric Gaussian mixture model (GMM) and non-parametric vector quantization (VQ) model using the best known clustering algorithms including iterative (K-means, random swap, expectation-maximization), hierarchical (pairwise nearest neighbor, split, split-and-merge), evolutionary (genetic algorithm), neural (self-organizing map) and fuzzy (fuzzy C-means) approaches. We study recognition accuracy, processing time, clustering validity, and correlation of clustering quality and recognition accuracy. Experiments from these complementary observations indicate clustering is not a critical task in speaker recognition and the choice of the algorithm should be based on computational complexity and simplicity of the implementation. This is mainly because of three reasons: the data is not clustered, large models are used and only the best algorithms are considered. For low-order models, choice of the algorithm, however, can have a significant effect. Index Terms

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Review Of Clustering Methods In Speaker Recognitionmentioning

confidence: 99%

Section: Review Of Clustering Methods In Speaker Recognitionmentioning

confidence: 99%

Section: Research Objectives and Hypothesesmentioning

confidence: 99%

See 2 more Smart Citations

Comparison of clustering methods: A case study of text-independent speaker modeling

Kinnunen¹,

Sidoroff²,

Tuononen³

et al. 2011

Pattern Recognition Letters

View full text Add to dashboard Cite

show abstract

“…MSE is a common metric in the literature to compute the match score between the training and testing samples [22]. A better metric for computing match score in speaker recognition systems is a topic of on-going research [23]. MSE is computed according to the following expression.…”

Section: Speaker Recognition Using Mfccmentioning

confidence: 99%

On the Performance Degradation of Speaker Recognition System due to Variation in Speech Characteristics Caused by Physiological Changes

Usman¹

2017

IJCDS

View full text Add to dashboard Cite

Speaker recognition is the process of identifying a person using their speech characteristics (voice biometrics). Speech characteristics of an individual can vary due to physiological changes which may be caused by health changes, physical activity as well as emotional changes. Such changes in speech characteristics are likely to affect the accuracy of speaker recognition systems. In this paper, the performance degradation of a speaker recognition system is quantified, empirically, when the characteristics of an individual's speech change due to physiological changes caused by 'physical activity'. The speaker recognition system used in this work is based on Mel-Frequency Cepstrum Coefficients (MFCC's) and Vector Quantization (VQ). When the speech sample of a user is obtained soon after high intensity physical activity, the changes in the individual's speech characteristics affect the accuracy of speaker recognition systems. It is necessary to understand how speaker recognition systems are affected by changes in speech characteristics in order to improve their immunity to such changes. From speech recorded after physical activity, it is found that the duration of 'voiced component' which has prominent discriminative characteristics of speech is shortened and it has an effect on the accuracy of speaker recognition system.

show abstract

Heuristic Clustering Algorithms

Bagirov

Karmitsa

Taheri

2020

Unsupervised and Semi-Supervised Learning

View full text Add to dashboard Cite

Comparison of the impact of some Minkowski metrics on VQ/GMM based speaker recognition

Cited by 11 publications

References 19 publications

Comparison of clustering methods: A case study of text-independent speaker modeling

Comparison of clustering methods: A case study of text-independent speaker modeling

On the Performance Degradation of Speaker Recognition System due to Variation in Speech Characteristics Caused by Physiological Changes

Heuristic Clustering Algorithms

Contact Info

Product

Resources

About