Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-430
|View full text |Cite
|
Sign up to set email alerts
|

CNN-Based Joint Mapping of Short and Long Utterance i-Vectors for Speaker Verification Using Short Utterances

Abstract: Text-independent speaker recognition using short utterances is a highly challenging task due to the large variation and content mismatch between short utterances. I-vector and probabilistic linear discriminant analysis (PLDA) based systems have become the standard in speaker verification applications, but they are less effective with short utterances. To address this issue, we propose a novel method, which trains a convolutional neural network (CNN) model to map the i-vectors extracted from short utterances to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 12 publications
(7 citation statements)
references
References 9 publications
0
7
0
Order By: Relevance
“…Speaker recognition was investigated in [16]. Gaussian mixtures were used as the main classification method for speaker recognition.…”
Section: Literature Review and Problem Statementmentioning
confidence: 99%
See 1 more Smart Citation
“…Speaker recognition was investigated in [16]. Gaussian mixtures were used as the main classification method for speaker recognition.…”
Section: Literature Review and Problem Statementmentioning
confidence: 99%
“…( ) 16) where N and F are the matrices composed of the zero-and first-order statistics, and is the covariance matrix of F. These i-vectors will have information of the language contained in the utterance they represent, since that is the task for which the T matrix has been trained. Voice recognition bottleneck features [35].…”
Section: Classifier System Deep Neural Networkmentioning
confidence: 99%
“…The SSAE architecture can also be used to estimate the HB components directly from the regression layer. A similar CNN based architecture designed to regularise the mapping of short i-vectors to long i-vectors for a speaker diarization task is reported in [29]. The focus here is different, i.e., to regularise/supervise dimensionality reduction so that it preserves information critical to ABE.…”
Section: Application To Abementioning
confidence: 99%
“…Since proposed in [1], i-vector has become the state-of-the-art speaker modeling technique, it is a simple but elegant factor analysis model, inspired by the Joint Factor Analysis (JFA) [2] framework. Though some researchers have been working on improving the i-vector model itself [3,4], more researchers pay attention to the compensation techniques in the i-vector space [5,6,7,8]. JFA can be regarded as a compensation method in the GMM super-vector space, which models the speaker and channel variabilities separately.…”
Section: Introductionmentioning
confidence: 99%
“…Authors in [7] proposed to use an auto-encoder to learn a projection which maps noisy i-vectors to de-noised ones. To address the short-duration problem of i-vector [20], a Convolutional Neural Network (CNN) based system was trained in [8] to map the i-vectors extracted from short utterances to the corresponding long-utterance i-vectors.…”
Section: Introductionmentioning
confidence: 99%