Fast Variational Bayes for Heavy-tailed PLDA Applied to i-vectors and x-vectors

Silnova, Anna; Brümmer, Niko; Garcia‐Romero, Daniel; Snyder, David; Burget, Lukáš

doi:10.21437/interspeech.2018-2128

Cited by 22 publications

(19 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There is a consistent improvement over the systems without s-norm (15)(16)(17). Fusion of these three systems (18)(19)(20) form our primary submission (system 23) to the fixed condition. We have also run a postevaluation fusion with the same systems without s-norm (15)(16)(17) which is show in row 24.…”

Section: Results and Analysismentioning

confidence: 76%

See 1 more Smart Citation

Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge

et al. 2019

Self Cite

View full text Add to dashboard Cite

This paper is a post-evaluation analysis of our efforts in VOiCES 2019 Speaker Recognition challenge. All systems in the fixed condition are based on x-vectors with different features and DNN topologies. The single best system reaches minDCF of 0.38 (5.25% EER) and a fusion of 3 systems yields minDCF of 0.34 (4.87% EER). We also analyze how speaker verification (SV) systems evolved in last few years and show results also on SITW 2016 Challenge. EER on the core-core condition of the SITW 2016 challenge dropped from 5.85% to 1.65% for system fusions submitted for SITW 2016 and VOiCES 2019, respectively. The less restrictive open condition allowed us to use external data for PLDA adaptation and achieve additional small performance improvement. In our submission to open condition, we used three x-vector systems and also one system based on i-vectors.

show abstract

Section: Results and Analysismentioning

confidence: 76%

“…• Training networks with 9 epochs (instead of 3 [19]. It was trained on concatenated audio files from VOXCELEB 1 and 2.Length normalization, centering, LDA, reducing dimensionality of vectors to 300, followed by another length normalization were applied to all i-vectors.…”

Section: X-vector Systemsmentioning

confidence: 99%

Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge

et al. 2019

Self Cite

View full text Add to dashboard Cite

show abstract

“…For the baseline x-vector (architecture (a)), we used the generative Heavy Tailed PLDA (HT-PLDA) classifier described in [11], as it was shown to outperform a Gaussian PLDA system. The HT-PLDA was trained using the x-vectors from the 485,385 VoxCeleb recordings that we processed by centering and whitening, but no unit-length projection was applied [10].…”

Section: Ht-plda Scoringmentioning

confidence: 99%

“…Once the DNN is trained, the embeddings are extracted for each recording and compared using a similarity metric. The metric learning process is disjoint from the DNN training and it is typically done using some variant of probabilistic linear discriminant analysis (PLDA) [8,9,10,11].…”

Section: Introductionmentioning

confidence: 99%

x-Vector DNN Refinement with Full-Length Recordings for Speaker Recognition

et al. 2019

Self Cite

View full text Add to dashboard Cite

State-of-the-art text-independent speaker recognition systems for long recordings (a few minutes) are based on deep neural network (DNN) speaker embeddings. Current implementations of this paradigm use short speech segments (a few seconds) to train the DNN. This introduces a mismatch between training and inference when extracting embeddings for long duration recordings. To address this, we present a DNN refinement approach that updates a subset of the DNN parameters with full recordings to reduce this mismatch. At the same time, we also modify the DNN architecture to produce embeddings optimized for cosine distance scoring. This is accomplished using a largemargin strategy with angular softmax. Experimental validation shows that our approach is capable of producing embeddings that achieve record performance on the SITW benchmark.

show abstract

“…Although adversarial learning based unsupervised DA [18,19] has greatly boosted the performance of SV systems under domain mismatch scenarios, the adversarial training may lead to non-Gaussian latent vectors, which do not meet the Gaussianity requirement of the PLDA backend. This problem can be solved by using heavy-tailed PLDA [21,22] or applying ivector length normalization [23]. However, the former is more computationally expensive than the Gaussian PLDA and the latter is not really a Gaussianization procedure but a sub-optimal compromise.…”

Section: Introductionmentioning

confidence: 99%

Variational Domain Adversarial Learning for Speaker Verification

Mak

Chien³

2019

Interspeech 2019

View full text Add to dashboard Cite

Domain mismatch refers to the problem in which the distribution of training data differs from that of the test data. This paper proposes a variational domain adversarial neural network (VDANN), which consists of a variational autoencoder (VAE) and a domain adversarial neural network (DANN), to reduce domain mismatch. The DANN part aims to retain speaker identity information and learn a feature space that is robust against domain mismatch, while the VAE part is to impose variational regularization on the learned features so that they follow a Gaussian distribution. Thus, the representation produced by VDANN is not only speaker discriminative and domaininvariant but also Gaussian distributed, which is essential for the standard PLDA backend. Experiments on both SRE16 and SRE18-CMN2 show that VDANN outperforms the Kaldi baseline and the standard DANN. The results also suggest that VAE regularization is effective for domain adaptation.

show abstract

Fast Variational Bayes for Heavy-tailed PLDA Applied to i-vectors and x-vectors

Cited by 22 publications

References 9 publications

Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge

Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge

x-Vector DNN Refinement with Full-Length Recordings for Speaker Recognition

Variational Domain Adversarial Learning for Speaker Verification

Contact Info

Product

Resources

About