End-to-End DNN Based Speaker Recognition Inspired by I-Vector and PLDA

Rohdin, Johan; Silnova, Anna; Díez, Mireia; Plchot, Oldrch; Matějka, Pavel; Burget, Lukáš

doi:10.1109/icassp.2018.8461958

Cited by 47 publications

(45 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…As we will see in the results section, the approach above leads to good discrimination performance over a large set of conditions, sometimes improving over the baseline, in agreement with results found in [6]. Nevertheless, calibration performance of this approach is far from optimal on many conditions.…”

Section: Proposed Discriminative Backendsupporting

confidence: 84%

“…We propose a backend with the same functional form as the PLDAbackend explained in the previous section, but where all parameters are optimized jointly, in a manner similar to the one used in [6] (though, note that in this paper we only optimize jointly up to the backend stage instead of the full pipeline, as in Rohdin's paper). We first initialize all parameters in Equations (1), (2) and (5) as in the standard PLDA-based backend.…”

Section: Proposed Discriminative Backendmentioning

confidence: 99%

“…The parameters for the embedding extractor and the scorer are trained jointly to optimize binary cross-entropy. In [6], the authors propose to use an architecture that mimics the previous i-vector [8] pipeline for speaker verification, pretraining all its parameters separately and then fine tuning the full model to minimize binary cross-entropy. While these two approaches have the potential to result in well-calibrated scores, neither of the two papers show overall system performance, only discrimination performance.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Discriminative Condition-Aware Backend for Speaker Verification

Ferrer

McLaren

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

We present a scoring approach for speaker verification that mimics the standard PLDA-based backend process used in most current speaker verification systems. However, unlike the standard backends, all parameters of the model are jointly trained to optimize the binary cross-entropy for the speaker verification task. We further integrate the calibration stage inside the model, making the parameters of this stage depend on metadata vectors that represent the conditions of the signals. We show that the proposed backend has excellent outof-the-box calibration performance on most of our test sets, making it an ideal approach for cases in which the test conditions are not known and development data is not available for training a domainspecific calibration model.

show abstract

Section: Proposed Discriminative Backendsupporting

confidence: 84%

Section: Proposed Discriminative Backendmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Discriminative Condition-Aware Backend for Speaker Verification

Ferrer

McLaren

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…We show that, with such an approach, we can achieve a reasonable performance. Our results are perhaps not as competitive as those achieved with current state-of-the-art x-vector systems [18], nevertheless, we are now closer to our goal which is to further use this model in the fully end-to-end discriminative system [19] that can be initialized from a robust generative baseline. Figure 1: Scheme of an end-to-end speaker verification system based on a feed forward NN designed to mimic a generic speaker verification system ( [19]).…”

Section: Introductionmentioning

confidence: 77%

“…In [19], we had built an end-to-end system ( Fig. 1) that already seemingly fits our goal, but it was exactly the i-vector extractor component that posed the biggest challenge and we had to resort to ad-hoc simplifications, such as PCA-based dimensionality reduction of large dimensional sufficient statistics coming from the GMM-UBM.…”

Section: Theoretical Backgroundmentioning

confidence: 99%

Factorization of Discriminatively Trained i-Vector Extractor for Speaker Recognition

Novotny¹,

Plchot

Glembek

et al. 2019

Interspeech 2019

View full text Add to dashboard Cite

In this work, we continue in our research on i-vector extractor for speaker verification (SV) and we optimize its architecture for fast and effective discriminative training. We were motivated by computational and memory requirements caused by the large number of parameters of the original generative ivector model. Our aim is to preserve the power of the original generative model, and at the same time focus the model towards extraction of speaker-related information. We show that it is possible to represent a standard generative i-vector extractor by a model with significantly less parameters and obtain similar performance on SV tasks. We can further refine this compact model by discriminative training and obtain i-vectors that lead to better performance on various SV benchmarks representing different acoustic domains.

show abstract