2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
DOI: 10.1109/icassp.2018.8461958
|View full text |Cite
|
Sign up to set email alerts
|

End-to-End DNN Based Speaker Recognition Inspired by I-Vector and PLDA

Abstract: Recently, several end-to-end speaker verification systems based on deep neural networks (DNNs) have been proposed. These systems have been proven to be competitive for text-dependent tasks as well as for text-independent tasks with short utterances. However, for text-independent tasks with longer utterances, end-to-end systems are still outperformed by standard i-vector + PLDA systems. In this work, we develop an end-to-end speaker verification system that is initialized to mimic an i-vector + PLDA baseline. T… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
43
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 47 publications
(45 citation statements)
references
References 19 publications
1
43
0
1
Order By: Relevance
“…As we will see in the results section, the approach above leads to good discrimination performance over a large set of conditions, sometimes improving over the baseline, in agreement with results found in [6]. Nevertheless, calibration performance of this approach is far from optimal on many conditions.…”
Section: Proposed Discriminative Backendsupporting
confidence: 84%
See 2 more Smart Citations
“…As we will see in the results section, the approach above leads to good discrimination performance over a large set of conditions, sometimes improving over the baseline, in agreement with results found in [6]. Nevertheless, calibration performance of this approach is far from optimal on many conditions.…”
Section: Proposed Discriminative Backendsupporting
confidence: 84%
“…We propose a backend with the same functional form as the PLDAbackend explained in the previous section, but where all parameters are optimized jointly, in a manner similar to the one used in [6] (though, note that in this paper we only optimize jointly up to the backend stage instead of the full pipeline, as in Rohdin's paper). We first initialize all parameters in Equations (1), (2) and (5) as in the standard PLDA-based backend.…”
Section: Proposed Discriminative Backendmentioning
confidence: 99%
See 1 more Smart Citation
“…We show that, with such an approach, we can achieve a reasonable performance. Our results are perhaps not as competitive as those achieved with current state-of-the-art x-vector systems [18], nevertheless, we are now closer to our goal which is to further use this model in the fully end-to-end discriminative system [19] that can be initialized from a robust generative baseline. Figure 1: Scheme of an end-to-end speaker verification system based on a feed forward NN designed to mimic a generic speaker verification system ( [19]).…”
Section: Introductionmentioning
confidence: 77%
“…In [19], we had built an end-to-end system ( Fig. 1) that already seemingly fits our goal, but it was exactly the i-vector extractor component that posed the biggest challenge and we had to resort to ad-hoc simplifications, such as PCA-based dimensionality reduction of large dimensional sufficient statistics coming from the GMM-UBM.…”
Section: Theoretical Backgroundmentioning
confidence: 99%