Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1209
|View full text |Cite
|
Sign up to set email alerts
|

Triplet Loss Based Cosine Similarity Metric Learning for Text-independent Speaker Recognition

Abstract: Deep neural network based speaker embeddings become increasingly popular in the text-independent speaker recognition task. In contrast to a generatively trained i-vector extractor, a DNN speaker embedding extractor is usually trained discriminatively in the closed set classification scenario using softmax. The problem we addressed in the paper is choosing a dnn based speaker embedding backend solution for the speaker verification scoring. There are several options to perform speaker verification in the dnn emb… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
28
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 50 publications
(32 citation statements)
references
References 28 publications
(46 reference statements)
0
28
0
Order By: Relevance
“…Results demonstrated that the proposed pAUC deep embedding is highly competitive in performance with the state-of-the-art identification loss based deep embedding methods with the Softmax and ArcSoftmax output units. Note that a very recent work proposed at the same time as our work in [23] maximizes the area under the ROC curve (AUC) for text-dependent speaker verification [20]. It can be shown that AUC is a particular case of pAUC and experimental results show the pAUC deep embedding outperforms the AUC deep embedding significantly.…”
Section: Introductionmentioning
confidence: 82%
“…Results demonstrated that the proposed pAUC deep embedding is highly competitive in performance with the state-of-the-art identification loss based deep embedding methods with the Softmax and ArcSoftmax output units. Note that a very recent work proposed at the same time as our work in [23] maximizes the area under the ROC curve (AUC) for text-dependent speaker verification [20]. It can be shown that AUC is a particular case of pAUC and experimental results show the pAUC deep embedding outperforms the AUC deep embedding significantly.…”
Section: Introductionmentioning
confidence: 82%
“…where s k is the score for trial k given by Equation (2), σ is the sigmoid function, and α and β are the calibration parameters, trained to minimize the quantity in Equation (3). To summarize, Equations (1), (2) and (5) show the pipeline that is applied to the embeddings in the standard PLDA-based backend. The parameters involved in these equations are all trained separately, freezing the parameters of the previous steps in order to obtain input data to train the next step.…”
Section: Standard Plda-based Backendmentioning
confidence: 99%
“…We propose a backend with the same functional form as the PLDAbackend explained in the previous section, but where all parameters are optimized jointly, in a manner similar to the one used in [6] (though, note that in this paper we only optimize jointly up to the backend stage instead of the full pipeline, as in Rohdin's paper). We first initialize all parameters in Equations (1), (2) and (5) as in the standard PLDA-based backend. Then, we fine tune the parameters to optimize the cross-entropy in Equation (3) using some variant of stochastic gradient descent.…”
Section: Proposed Discriminative Backendmentioning
confidence: 99%
See 1 more Smart Citation
“…It has quite a few applications including noise cancelling, audio editing, preprocessing for speech recognition, just to name a few. Denote the noisy speech as y(t), we have y(t) = x(t) + n(t) (1) where x(t) and n(t) are respectively the clean speech and the noise, with t being the time index. Speech enhancement tries to recover the clean speech x 1 from the noisy speech y.…”
Section: Introductionmentioning
confidence: 99%