On the use of i–vector posterior distributions in Probabilistic Linear Discriminant Analysis

Cumani, Sandro; Plchot, Oldřich; Laface, Pietro

doi:10.1109/taslp.2014.2308473

Cited by 60 publications

(51 citation statements)

References 22 publications

(33 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Generative and discriminative models are two general approaches for language recognition based on i-vectors. Although the reported results using discriminative methods such as multiclass logistic regression and support vector machines are comparable to those of using generative models [22] such as Gaussian and probabilistic linear discriminant analysis (PLDA) models, the generative models provide an appropriate framework to benefit from the uncertainty in the i-vector extraction process through the posterior covariance matrix of the i-vector [23]. PLDA [24], originally studied in image processing, has been very successful in speaker and language recognition.…”

Section: Plda Modelmentioning

confidence: 99%

“…Fig.1 indicates a high correlation (0.98) between the proposed quality measure and utterance duration in the LRE15 database (described in Section 5.1). However, since the i-vector posterior covariance is also influenced by other factors such as background noise, channel type, incomplete transformations and the acoustic content of the utterance [19,23], we expect that the proposed quality measure captures more information about the quality of an utterance than its duration.…”

Section: Proposed I-vector Quality Measurementioning

confidence: 99%

“…In contrast, length normalization [30] that maps i-vectors on the unit sphere bỹ φ = φ/ φ does not satisfy the Gaussian distribution and hence, the Gaussian assumption in PLDA is no longer applicable. To address this issue, one can either use a non-Gaussian assumption for the PLDA model such as the heavy-tailed PLDA model [25] or make the transformation linear using first-order Taylor series expansion around the i-vector posterior mean [23]. Applying a simplified version of first-order Taylor expansion around the i-vector posterior mean [23] results in the length normalized i-vector posterior mean and covariance as:…”

Section: Uncertainty Propagation Through the I-vector Postprocessing mentioning

confidence: 99%

See 2 more Smart Citations

Incorporating uncertainty as a Quality Measure in I-Vector Based Language Recognition

Poorjam¹,

Saeidi²,

Kinnunen³

et al. 2016

The Speaker and Language Recognition Workshop (Odyssey 2016)

View full text Add to dashboard Cite

State-of-the-art language recognition systems involve modeling utterances with the i-vectors. However, the uncertainty of the i-vector extraction process represented by the i-vector posterior covariance is affected by various factors such as channel mismatch, background noise, incomplete transformations and duration variability. In this paper, we propose a new quality measure based on the i-vector posterior covariance and incorporate it into the recognition process to improve the recognition accuracy. The experimental results with LRE15 database and various duration conditions show a 2.9% relative improvement in terms of average performance cost as a result of incorporating the proposed quality measure in language recognition systems. I-vector based language recognitionIn this section, we describe the main components of an ivector/PLDA-based language recognition system. The i-vector frameworkAn i-vector is a low-dimensional feature vector for representing utterances of arbitrary duration. We assume that each utterance possesses a speaker-and channel-dependent GMM mean supervector, M, in the form [5]:

show abstract

Section: Plda Modelmentioning

confidence: 99%

Section: Proposed I-vector Quality Measurementioning

confidence: 99%

Section: Uncertainty Propagation Through the I-vector Postprocessing mentioning

confidence: 99%

See 1 more Smart Citation

Incorporating uncertainty as a Quality Measure in I-Vector Based Language Recognition

Poorjam¹,

Saeidi²,

Kinnunen³

et al. 2016

The Speaker and Language Recognition Workshop (Odyssey 2016)

View full text Add to dashboard Cite

show abstract

“…Other work based on the aleatoric uncertainty concept has also given rise to uncertainty propagation approaches for speaker recognition. However, these approaches focused on the issue of computing representations with in-sufficient data, caused by utterances with different, possibly short durations [9][10][11][12][13][14][15].…”

Section: Introductionmentioning

confidence: 99%

An Improved Uncertainty Propagation Method for Robust I-vector Based Speaker Recognition

Ribas

2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

The performance of automatic speaker recognition systems degrades when facing distorted speech data containing additive noise and/or reverberation. Statistical uncertainty propagation has been introduced as a promising paradigm to address this challenge. So far, different uncertainty propagation methods have been proposed to compensate noise and reverberation in i-vectors in the context of speaker recognition. They have achieved promising results on small datasets such as YOHO and Wall Street Journal, but little or no improvement on the larger, highly variable NIST Speaker Recognition Evaluation (SRE) corpus. In this paper, we propose a complete uncertainty propagation method, whereby we model the effect of uncertainty both in the computation of unbiased Baum-Welch statistics and in the derivation of the posterior expectation of the i-vector. We conduct experiments on the NIST-SRE corpus mixed with real domestic noise and reverberation from the CHiME-2 corpus and preprocessed by multichannel speech enhancement. The proposed method improves the equal error rate (EER) by 4% relative compared to a conventional i-vector based speaker verification baseline. This is to be compared with previous methods which degrade performance.

show abstract

“…The conventional speaker verification approach entails using i-vectors [3] and probabilistic linear discriminant analysis (PLDA) [2]. As a supervised learning method, i-vector requires sufficient statistics which are computed from a Gaussian Mixture Model-Universal Background Model (GMM-UBM), followed by a PLDA model to produce verification scores [3].…”

Section: Introductionmentioning

confidence: 99%

End-to-End Residual CNN with L-GM Loss Speaker Verification System

Shi

Zhu

2018

2018 IEEE 23rd International Conference on Digital Signal Processing (DSP)

View full text Add to dashboard Cite

We propose an end-to-end speaker verification system based on the neural network and trained by a loss function with less computational complexity. The end-to-end speaker verification system in this paper consists of a ResNet architecture to extract features from utterance, then produces utterancelevel speaker embeddings, and train using the large-margin Gaussian Mixture loss function. Influenced by the large-margin and likelihood regularization, large-margin Gaussian Mixture loss function benefits the speaker verification performance. Experimental results demonstrate that the Residual CNN with largemargin Gaussian Mixture loss outperforms DNN-based i-vector baseline by more than 10% improvement in accuracy rate.

show abstract

On the use of i–vector posterior distributions in Probabilistic Linear Discriminant Analysis

Cited by 60 publications

References 22 publications

Incorporating uncertainty as a Quality Measure in I-Vector Based Language Recognition

Incorporating uncertainty as a Quality Measure in I-Vector Based Language Recognition

An Improved Uncertainty Propagation Method for Robust I-vector Based Speaker Recognition

End-to-End Residual CNN with L-GM Loss Speaker Verification System

Contact Info

Product

Resources

About