Improving GMM–UBM speaker verification using discriminative feedback adaptation

Chao, Yi-Hsiang; Tsai, Wei-Ho; Wang, Hsin‐Min

doi:10.1016/j.csl.2009.01.002

Cited by 11 publications

(8 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…2). GMM-UBM can describe the differences between speakers from the point of statistics and it can generate more sophisticated model description under a less speech training set, which has better robustness [15][16].…”

Section: B Speaker Clustering Based On Model Distancementioning

confidence: 99%

“…It employs every speech segment for clustering according to MAP (Maximum A Posteriori), obtains GMM (Gaussian Mixture Model) of every speech segment for clustering adaptively, then constructs affinity matrix of spectral clustering by calculating probability distance based on a finite observation sequence model. GMM-UBM (Gaussian Mixture Model -Universal Background Model) is used in this paper can describe the speakers' differences from the point of statistics adequately, GMM-UBM can produce more precise model expression under the condition of a small voice training set, which has better robustness [15][16]. Probability distance based on a finite observation sequence model in this paper considers not only the models differences but also samples, and it can overcome the shortcoming of model description that is not accurate enough under the condition of the relative short length of speech segment.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

An Algorithm of Speaker Clustering Based on Model Distance

Yang

2014

JMM

View full text Add to dashboard Cite

An algorithm based on Model Distance (MD) for spectral speaker clustering is proposed to deal with the shortcoming of general spectral clustering algorithm in describing the distribution of signal source. First, an Universal Background Model (UBM) is created with a large quantity of independent speakers; Then, Gaussian Mixture Model (GMM) is trained from the UBM for every speech segment; At last, the probability distance between the GMM of every speech segment is used to build affinity matrix, and speaker spectral clustering is done on the affinity matrix. Experimental results based on news and conference data sets show that an average of 6.38% improvements in F measure is obtained in comparison with algorithm based on the feature vector distance. In addition, the proposed algorithm is 11.72 times faster.

show abstract

Section: B Speaker Clustering Based On Model Distancementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

An Algorithm of Speaker Clustering Based on Model Distance

Yang

2014

JMM

View full text Add to dashboard Cite

show abstract

“…Unlike sigmoid function, the squared loss function has greater gradient for large d, which gives more penalty for the severe false verification segments. To give an intuitive illustration, we borrow a figure from (Chao et al, 2009) and show it in Fig. 3.…”

Section: Squared Loss Functionmentioning

confidence: 99%

“…3. (Chao et al, 2009) Using this squared loss function, the gradient of the objective function w.r.t the mean vector will be…”

Section: Squared Loss Functionmentioning

confidence: 99%

“…In (Angkititrakul & Hansen, 2007), the training process is divided into two stages: in the first stage, the MCE is used minimize the classification error among the in-set speaker models; in the second stage, the MVE is used to minimize the verification error between the in-set and background models. In (Chao et al, 2008;2009), the MVE methods are used to reinforce the discriminability between the target speaker model and the target speaker dependent anti-model. (2) Other approaches attempt to discriminatively adapt the target speaker model from the UBM, which can be viewed as the modification of the classical maximum a posteriori (MAP) adaptation (Gauvain & Lee, 1994).…”

mentioning

confidence: 99%

See 1 more Smart Citation