2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2019
DOI: 10.1109/apsipaasc47483.2019.9023253
|View full text |Cite
|
Sign up to set email alerts
|

Triplet Based Embedding Distance and Similarity Learning for Text-independent Speaker Verification

Abstract: Speaker embeddings become growing popular in the text-independent speaker verification task. In this paper, we propose two improvements during the training stage. The improvements are both based on triplet cause the training stage and the evaluation stage of the baseline x-vector system focus on different aims. Firstly, we introduce triplet loss for optimizing the Euclidean distances between embeddings while minimizing the multi-class cross entropy loss. Secondly, we design an embedding similarity measurement … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 22 publications
(5 citation statements)
references
References 18 publications
(21 reference statements)
0
5
0
Order By: Relevance
“…Unlike multi-class model that only takes class feature into account, the GE2E part can help getting similar embedding closer and separating different embedding apart. [9,12]. So we can expect that this…”
Section: Joint Multi-class and Similaritymentioning
confidence: 84%
See 1 more Smart Citation
“…Unlike multi-class model that only takes class feature into account, the GE2E part can help getting similar embedding closer and separating different embedding apart. [9,12]. So we can expect that this…”
Section: Joint Multi-class and Similaritymentioning
confidence: 84%
“…Recently, Transformer-based models have demonstrated promising results in a variety of ASR and NLP tasks and are comparable to recurrent neural networks, as they can compute the attention weights in the whole input frame parallelly [6][7][8][9][10]. That ability would contribute to learning necessary feature from signal itself.…”
Section: Introductionmentioning
confidence: 99%
“…In our approach we estimate the location likelihood of a platform using a modified version of VGG16 presented by [Kim, 2017]. The network is trained with a triplet margin loss [Veit et al, 2017] [Ren, 2019] [Hermans et al, 2017 based on a cosine distance between an anchor, positive and negative triple as shown in Eq. ( 1) and Eq.…”
Section: Approach and Experimental Resultsmentioning
confidence: 99%
“…This vector is obtained from the output of a speaker verification model trained to minimize a triplet loss. This speaker verification model is pre-trained using pairs of utterances, similar to [18]. For every speaker, the vectors corresponding to all their utterances are pre-computed and then averaged to form the speaker vectors.…”
Section: Methodsmentioning
confidence: 99%