The I4U system in NIST 2008 speaker recognition evaluation

Li, Haizhou; Ma, Bin; Lee, Kong Aik; Sun, Hanwu; Zhu, Dan; Sim, Khe Chai; You, Changhuai; Tong, Rong; Kärkkäinen, Ismo; Huang, Chien‐Lin; Pervouchine, Vladimir; Guo, Wenbin; Li, Yijie; Dai, Li-Rong; Nosratighods, Mohaddeseh; Thiruvaran, Tharmarajah; Epps, Julien; Ambikairajah, Eliathamby; Chng, Eng Siong; Schultz, Tanja; Qin, Jin

doi:10.1109/icassp.2009.4960555

Cited by 23 publications

(12 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, there are some common practice that we can follow. To facilitate discussion, we present here the results of the latest NIST 2008 speaker recognition evaluation submission by the I4U consortium [138]. All the classifiers of I4U used short-term spectral features and the focus was in the supervectors classifiers.…”

Section: Summary: Which Supervector Methods To Use?mentioning

confidence: 99%

“…They should be augmented with nuisance attribute projection (NAP) [28] and test normalization (T-norm) [14]. Table 1: Performance of individual classifiers and their fusion of I4U system on I4U's telephone quality development dataset [138]. UNC = Uncompensated, EIG = Eigenchannel, JFA = Joint factor analysis, GLDS = Generalized linear discriminant sequence, GSV = Gaussian supetvector, FT = Feature transformation, PSK = Probabilistic sequence kernel, BK = Bhattacharyya kernel.…”

Section: Summary: Which Supervector Methods To Use?mentioning

confidence: 99%

See 1 more Smart Citation

An overview of text-independent speaker recognition: From features to supervectors

Kinnunen¹,

Li²

2010

Speech Communication

Self Cite

1,219

706

View full text Add to dashboard Cite

This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling. We elaborate advanced computational techniques to address robustness and session variability. The recent progress from vectors towards supervectors opens up a new area of exploration and represents a technology trend. We also provide an overview of this recent development and discuss the evaluation methodology of speaker recognition systems. We conclude the paper with discussion on future directions.

show abstract

Section: Summary: Which Supervector Methods To Use?mentioning

confidence: 99%

Section: Summary: Which Supervector Methods To Use?mentioning

confidence: 99%

An overview of text-independent speaker recognition: From features to supervectors

Kinnunen¹,

Li²

2010

Speech Communication

Self Cite

1,219

706

View full text Add to dashboard Cite

show abstract

“…The size of the matrices becomes enormous when more sessions are available for each speaker in the development data. This is typically the case for speaker recognition where the number of utterances per speaker is usually in the range from ten to over a hundred [38,39]. In the following, we estimate the parameters μ; F; G; Σ f gof the PLDA model using the expectation maximization (EM) algorithm.…”

Section: Probabilistic Ldamentioning

confidence: 99%

“…In (38), K = log|M 2 |/2 − log|M 1 | is constant for the given set of parameters F; G; Σ f g . Though K diminishes when score normalization is applied, we could calculate the two log-determinant terms easily by using the property of eigenvalue decomposition.…”

Section: Plda Verification Scorementioning

confidence: 99%

PLDA in the i-supervector space for text-independent speaker verification

Jiang

Lee

Wang

2014

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

In this paper, we advocate the use of the uncompressed form of i-vector and depend on subspace modeling using probabilistic linear discriminant analysis (PLDA) in handling the speaker and session (or channel) variability. An i-vector is a low-dimensional vector containing both speaker and channel information acquired from a speech segment. When PLDA is used on an i-vector, dimension reduction is performed twice: first in the i-vector extraction process and second in the PLDA model. Keeping the full dimensionality of the i-vector in the i-supervector space for PLDA modeling and scoring would avoid unnecessary loss of information. We refer to the uncompressed i-vector as the i-supervector. The drawback in using the i-supervector with PLDA is the inversion of large matrices in the estimation of the full posterior distribution, which we show can be solved rather efficiently by portioning large matrices into smaller blocks. We also introduce the Gaussianized rank-norm, as an alternative to whitening, for feature normalization prior to PLDA modeling. We found that the i-supervector performs better during normalization. A better performance is obtained by combining the i-supervector and i-vector at the score level. Furthermore, we also analyze the computational complexity of the i-supervector system, compared with that of the i-vector, at four different stages of loading matrix estimation, posterior extraction, PLDA modeling, and PLDA scoring.

show abstract

“…It is generally agreed upon that the integration of different discriminative cues can improve the performance of language recognition [16]- [18]. In the IIR's submission to the 2009 NIST LRE [26], 7 language classifiers were developed for the language recognition, as follows:…”

Section: Description Of the Sub-systemsmentioning

confidence: 99%

Error Corrective Fusion of Classifier Scores for Spoken Language Recognition

Dehzangi

Chng

et al. 2011

IEICE Trans. Inf. & Syst.

Self Cite

View full text Add to dashboard Cite

SUMMARYThis paper investigates a new method for fusion of scores generated by multiple classification sub-systems that help to further reduce the classification error rate in Spoken Language Recognition (SLR). In recent studies, a variety of effective classification algorithms have been developed for SLR. Hence, it has been a common practice in the National Institute of Standards and Technology (NIST) Language Recognition Evaluations (LREs) to fuse the results from several classification sub-systems to boost the performance of the SLR systems. In this work, we introduce a discriminative performance measure to optimize the performance of the fusion of 7 language classifiers developed as IIR's submission to the 2009 NIST LRE. We present an Error Corrective Fusion (ECF) method in which we iteratively learn the fusion weights to minimize error rate of the fusion system. Experiments conducted on the 2009 NIST LRE corpus demonstrate a significant improvement compared to individual sub-systems. Comparison study is also conducted to show the effectiveness of the ECF method.

show abstract

The I4U system in NIST 2008 speaker recognition evaluation

Cited by 23 publications

References 15 publications

An overview of text-independent speaker recognition: From features to supervectors

An overview of text-independent speaker recognition: From features to supervectors

PLDA in the i-supervector space for text-independent speaker verification

Error Corrective Fusion of Classifier Scores for Spoken Language Recognition

Contact Info

Product

Resources

About