2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2017
DOI: 10.1109/icassp.2017.7953181
|View full text |Cite
|
Sign up to set email alerts
|

Intra-class covariance adaptation in PLDA back-ends for speaker verification

Abstract: Multi-session training conditions are becoming increasingly common in recent benchmark datasets for both textindependent and text-dependent speaker verification. In the state-of-the-art i-vector framework for speaker verification, such conditions are addressed by simple techniques such as averaging the individual i-vectors, averaging scores, or modifying the Probabilistic Linear Discriminant Analysis (PLDA) scoring hypothesis for multi-session enrollment. The aforementioned techniques fail to exploit the speak… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2018
2018
2018
2018

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 13 publications
(15 reference statements)
0
2
0
Order By: Relevance
“…Automatic speaker verification (ASV) is an easy-to-use biometric authentication using speech. State-of-the-art ASV systems, as i-vector-based [2][3][4] and probabilistic linear discriminant analysis (PLDA)-based ones [5][6][7], have achieved reliable performances and are expected to be in practical use. Meanwhile, performances of speech synthesis algorithms such as text-to-speech (TTS) systems [8][9][10] and voice conversion systems [11,12] have been significantly improved.…”
Section: Introductionmentioning
confidence: 99%
“…Automatic speaker verification (ASV) is an easy-to-use biometric authentication using speech. State-of-the-art ASV systems, as i-vector-based [2][3][4] and probabilistic linear discriminant analysis (PLDA)-based ones [5][6][7], have achieved reliable performances and are expected to be in practical use. Meanwhile, performances of speech synthesis algorithms such as text-to-speech (TTS) systems [8][9][10] and voice conversion systems [11,12] have been significantly improved.…”
Section: Introductionmentioning
confidence: 99%
“…To extract i-vectors, Baum-Welch statistics are computed from a Gaussian Mixture Model-Universal Background Model (GMM-UBM), which is learned using a sequence of feature vectors. I-vectors then can be used to compare utterances directly using cosine similarity or probabilistic linear discriminant (PLDA) [14,15,16]. To improve upon i-vectors, deep neural networks (DNNs) have been first applied to gradually replace each step in computing i-vectors traditional speaker recognition systems [17,18].…”
Section: Related Workmentioning
confidence: 99%