The Speaker and Language Recognition Workshop (Odyssey 2018) 2018
DOI: 10.21437/odyssey.2018-49
|View full text |Cite
|
Sign up to set email alerts
|

Gaussian meta-embeddings for efficient scoring of a heavy-tailed PLDA model

Abstract: Embeddings in machine learning are low-dimensional representations of complex input patterns, with the property that simple geometric operations like Euclidean distances and dot products can be used for classification and comparison tasks. We introduce meta-embeddings, which live in more general inner product spaces and which are designed to better propagate uncertainty through the embedding bottleneck. Traditional embeddings are trained to maximize between-class and minimize within-class distances. Meta-embed… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
32
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 29 publications
(32 citation statements)
references
References 21 publications
0
32
0
Order By: Relevance
“…The latent identity variable framework [26] assumes that y is a pure representation of a person's identity and that there is a distribution on Y with known probability density function p(y). Given a likelihood function for the latent identity variable (e.g., meta-embedding [28]), one can make inferences about speaker identities within a set of speech utterances. Examples of such tasks include speaker verification, identification and clustering [29].…”
Section: Reinterpreting False Alarm Rate As Averaged Speaker-pair Conmentioning
confidence: 99%
“…The latent identity variable framework [26] assumes that y is a pure representation of a person's identity and that there is a distribution on Y with known probability density function p(y). Given a likelihood function for the latent identity variable (e.g., meta-embedding [28]), one can make inferences about speaker identities within a set of speech utterances. Examples of such tasks include speaker verification, identification and clustering [29].…”
Section: Reinterpreting False Alarm Rate As Averaged Speaker-pair Conmentioning
confidence: 99%
“…This means that we have to find approximations for both scoring and training. We make use of a new approximation, the Gaussian likelihood approximation, as recently published in [1]. In that paper, the approximation was used for both scoring and discriminative training.…”
Section: Ht-plda Modelmentioning
confidence: 99%
“…Both scoring and training recipes can be built around the likelihood for the hidden speaker identity variable, given the observation. Marginalization over the hidden variable, λij, gives a multivariate t-distribution for the observed vector [8,9,1]:…”
Section: The Gaussian Likelihood Approximationmentioning
confidence: 99%
See 1 more Smart Citation
“…The recent work in this direction focuses on using the speaker embeddings that are scored with a probabilistic linear discriminant analysis (PLDA) based back-end [18,19]. This kind of systems give comparable or better results to that obtained with i-vector speaker modeling.…”
Section: Introductionmentioning
confidence: 99%