2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2014
DOI: 10.1109/icassp.2014.6853692
|View full text |Cite
|
Sign up to set email alerts
|

Text-dependent GMM-JFA system for password based speaker verification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
13
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 22 publications
(14 citation statements)
references
References 6 publications
1
13
0
Order By: Relevance
“…In Hébert (2008), text dependent speaker verification is defined as a speaker verification task in which the lexicon used in the test phase is a subset of the lexicon pronounced by the speaker during the enrollment. By constraining the text of enrollment and testing utterances to be the same (verbal password), higher accuracy with shorter utterances can be achieved (Larcher et al, 2014b; Novoselov et al, 2014; Kenny et al, 2014; Variani et al, 2014). In Larcher et al (2014b), the hierarchical multi-layer acoustic model (HiLAM) was shown to outperform the conventional i-vector approach, since the latter does not explicitly take advantage of the temporal structure of the text dependent speech utterances.…”
Section: Introductionmentioning
confidence: 99%
“…In Hébert (2008), text dependent speaker verification is defined as a speaker verification task in which the lexicon used in the test phase is a subset of the lexicon pronounced by the speaker during the enrollment. By constraining the text of enrollment and testing utterances to be the same (verbal password), higher accuracy with shorter utterances can be achieved (Larcher et al, 2014b; Novoselov et al, 2014; Kenny et al, 2014; Variani et al, 2014). In Larcher et al (2014b), the hierarchical multi-layer acoustic model (HiLAM) was shown to outperform the conventional i-vector approach, since the latter does not explicitly take advantage of the temporal structure of the text dependent speech utterances.…”
Section: Introductionmentioning
confidence: 99%
“…In this system HMM segmentation is used to split a passphrase into individual digits. For each digit a State-GMM mean supervector is extracted as described in [7]. Each state is associated with a unique speaker-independent UBM, which is trained on the RSR2015 database training set.…”
Section: State-gmm-svmmentioning
confidence: 99%
“…The training set is also extended with Wells Fargo Bank dataset (WF), described in [7], which contains short digit passphrase utterances, and the training part of STC-Russiandigits dataset.…”
Section: Databases Descriptionmentioning
confidence: 99%
“…All i-vectors are length normalized and further regularized using the phrase-dependent Within-class Covariance Normalization (WCCN). A simple cosine distance scoring is used followed by phrase-dependent s-norm score normalization [10]. 19 Mel-Frequency Cepstral Coefficients (MFCC) + log energy is used as the baseline features.…”
Section: Baselinementioning
confidence: 99%