Stochastic Feature Transformation with Divergence-Based Out-of-Handset Rejection for Robust Speaker Verification

Mak, Man‐Wai; Tsang, Chi Leung; Kung, Sun Yuan

doi:10.1155/s1110865704308048

Cited by 16 publications

(8 citation statements)

References 27 publications

(47 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The graph structure was motivated by invariance against the affine feature distortion model for cepstral features (e.g. [151,155]). The method requires further development to validate the assumptions of the feature distortion model and to improve computational efficiency.…”

Section: Feature Normalizationmentioning

confidence: 99%

An overview of text-independent speaker recognition: From features to supervectors

Kinnunen¹,

Li²

2010

Speech Communication

1,221

706

View full text Add to dashboard Cite

This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling. We elaborate advanced computational techniques to address robustness and session variability. The recent progress from vectors towards supervectors opens up a new area of exploration and represents a technology trend. We also provide an overview of this recent development and discuss the evaluation methodology of speaker recognition systems. We conclude the paper with discussion on future directions.

show abstract

Section: Feature Normalizationmentioning

confidence: 99%

An overview of text-independent speaker recognition: From features to supervectors

Kinnunen¹,

Li²

2010

Speech Communication

1,221

706

View full text Add to dashboard Cite

show abstract

“…In this work, the feature transformation was combined with a handset selector (Tsang et al, 2002;Mak et al, 2004) for robust speaker verification. Specifically, before verification takes place, we compute one set of transformation parameters for each type of handsets that claimants are likely to use.…”

Section: Stochastic Feature Transformation and Handset Identificationmentioning

confidence: 99%

“…Similar to our previous work Mak et al, 2004;Yiu et al, 2003;Tsang et al, 2002), we trained a personalized 32-center GMM to model the characteristics of each client speaker in the system. 1 The feature vectors derived from the SA and SX sentence sets of the corresponding speaker were used for training, i.e., 7 sentences per GMM.…”

Section: Enrollment Proceduresmentioning

confidence: 99%

“…Because the LMS algorithm used in the ETSI standard is a kind of linear equalization algorithm, it may not perform satisfactorily on telephone handsets with nonlinear characteristics. To overcome the limitation of the LMS algorithm, this paper incorporates a feature transformation algorithm Mak et al, 2004;Tsang et al, 2002Tsang et al, , 2003 into the back-end recognizer to enhance the robustness of the speaker verification system against handset variations. Experiments with and without using the LMS-based blind equalization were also performed for comparison.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Extraction of Speaker Features from Different Stages of DSR Front-Ends for Distributed Speaker Verification

Mak¹,

Sit²,

Kung³

2005

Genet Resour Crop Evol

Self Cite

View full text Add to dashboard Cite

The ETSI has recently published a front-end processing standard for distributed speech recognition systems. The key idea of the standard is to extract the spectral features of speech signals at the front-end terminals so that acoustic distortion caused by communication channels can be avoided. This paper investigates the effect of extracting spectral features from different stages of the front-end processing on the performance of distributed speaker verification systems. A technique that combines handset selectors with stochastic feature transformation is also employed in a back-end speaker verification system to reduce the acoustic mismatch between different handsets. Because the feature vectors obtained from the back-end server are vector quantized, the paper proposes two approaches to adding Gaussian noise to the quantized feature vectors for training the Gaussian mixture speaker models. In one approach, the variances of the Gaussian noise are made dependent on the codeword distance. In another approach, the variances are a function of the distance between some unquantized training vectors and their closest code vector. The HTIMIT corpus was * Correspondence should be sent to M.W. Mak, Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong. Email: enmwmak@polyu.edu.hk. Tel: (852)27666257. Fax: (852)23628439. 1 used in the experiments and results based on 150 speakers show that stochastic feature transformation can be added to the back-end server for compensating transducer distortion. It is also found that better verification performance can be achieved when the LMS-based blind equalization in the standard is replaced by stochastic feature transformation.

show abstract

“…However, this approach may not be practical because users may use a new handset, which is not well represented in the training set, during verification. While this problem can be partially resolved by using a handset classifier with out-of-handset rejection capability [8,9], it is difficult to find a threshold for detecting unseen handsets. On the other hand, unsupervised (blind) compensation does not assume any knowledge of the channel characteristics.…”

Section: Introductionmentioning

confidence: 99%

Blind Stochastic Feature Transformation for Channel Robust Speaker Verification

Yiu

Mak

Cheung

et al. 2006

J VLSI Sign Process Syst Sign Image Video Technol

View full text Add to dashboard Cite

To improve the reliability of telephone-based speaker verification systems, channel compensation is indispensable. However, it is also important to ensure that the channel compensation algorithms in these systems surpress channel variations and enhance interspeaker distinction. This paper addresses this problem by a blind feature-based transformation approach in which the transformation parameters are determined online without any a priori knowledge of channel characteristics. Specifically, a composite statistical model formed by the fusion of a speaker model and a background model is used to represent the characteristics of enrollment speech. Based on the difference between the claimant's speech and the composite model, a stochastic matching type of approach is proposed to transform the claimant's speech to a region close to the enrollment speech. Therefore, the algorithm can estimate the transformation online without the necessity of detecting the handset types. Experimental results based on the 2001 NIST evaluation set show that the proposed transformation approach achieves significant improvement in both equal error rate and minimum detection cost as compared to cepstral mean subtraction, Znorm, and short-time Gaussianization.

show abstract

Stochastic Feature Transformation with Divergence-Based Out-of-Handset Rejection for Robust Speaker Verification

Cited by 16 publications

References 27 publications

An overview of text-independent speaker recognition: From features to supervectors

An overview of text-independent speaker recognition: From features to supervectors

Extraction of Speaker Features from Different Stages of DSR Front-Ends for Distributed Speaker Verification

Blind Stochastic Feature Transformation for Channel Robust Speaker Verification

Contact Info

Product

Resources

About