The Speaker and Language Recognition Workshop (Odyssey 2016) 2016
DOI: 10.21437/odyssey.2016-31
|View full text |Cite
|
Sign up to set email alerts
|

On autoencoders in the i-vector space for speaker recognition

Abstract: We present the detailed empirical investigation of the speaker verification system based on denoising autoencoder (DAE) in the i-vector space firstly proposed in [1]. This paper includes description of this system and discusses practical issues of the system training. The aim of this investigation is to study the properties of DAE in the i-vector space and analyze different strategies of initialization and training of the back-end parameters. Also in this paper we propose several improvements to our system to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
15
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 37 publications
(19 citation statements)
references
References 10 publications
0
15
0
Order By: Relevance
“…In most settings, DNNs are used as a replacement for Gaussian mixture models (GMMs) to improve the conventional i-vector approach [1] by having a more phonetically aware Universal Background Model (UBM) [2,3,4]. Other subsequent method based on DNN were introduced for noise-robust and domain-invariant i-vector [5,6,7] However, the process of training the GMM-UBM and extracting i-vectors largely remained the same.…”
Section: Introductionmentioning
confidence: 99%
“…In most settings, DNNs are used as a replacement for Gaussian mixture models (GMMs) to improve the conventional i-vector approach [1] by having a more phonetically aware Universal Background Model (UBM) [2,3,4]. Other subsequent method based on DNN were introduced for noise-robust and domain-invariant i-vector [5,6,7] However, the process of training the GMM-UBM and extracting i-vectors largely remained the same.…”
Section: Introductionmentioning
confidence: 99%
“…That prevented us from using any stand-alone VAD. The reader can refer to [7] for more DNN implementation details. 20 MFCC's (including log energy) were calculated using 23 filter banks in the range of 20-3700 Hz with their first-and second-order derivatives.…”
Section: Dnn -Based I-vector Systemmentioning
confidence: 99%
“…Aside from the standard PLDA we studied the application of a denoising autoencoder (DAE) based back-end [6,7] to SITW data "in the wild" conditions.…”
Section: Dae-based Back-endmentioning
confidence: 99%
See 2 more Smart Citations