Text-dependent GMM-JFA system for password based speaker verification

Novoselov, Sergey; Pekhovsky, Timur; Shulipa, Andrey; Sholokhov, Alexey

doi:10.1109/icassp.2014.6853692

Cited by 22 publications

(14 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In Hébert (2008), text dependent speaker verification is defined as a speaker verification task in which the lexicon used in the test phase is a subset of the lexicon pronounced by the speaker during the enrollment. By constraining the text of enrollment and testing utterances to be the same (verbal password), higher accuracy with shorter utterances can be achieved (Larcher et al, 2014b; Novoselov et al, 2014; Kenny et al, 2014; Variani et al, 2014). In Larcher et al (2014b), the hierarchical multi-layer acoustic model (HiLAM) was shown to outperform the conventional i-vector approach, since the latter does not explicitly take advantage of the temporal structure of the text dependent speech utterances.…”

Section: Introductionmentioning

confidence: 99%

Speaker verification based on the fusion of speech acoustics and inverted articulatory signals

Kim

Lammert

et al. 2016

Computer Speech & Language

View full text Add to dashboard Cite

We propose a practical, feature-level and score-level fusion approach by combining acoustic and estimated articulatory information for both text independent and text dependent speaker verification. From a practical point of view, we study how to improve speaker verification performance by combining dynamic articulatory information with the conventional acoustic features. On text independent speaker verification, we find that concatenating articulatory features obtained from measured speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the performance dramatically. However, since directly measuring articulatory data is not feasible in many real world applications, we also experiment with estimated articulatory features obtained through acoustic-to-articulatory inversion. We explore both feature level and score level fusion methods and find that the overall system performance is significantly enhanced even with estimated articulatory features. Such a performance boost could be due to the inter-speaker variation information embedded in the estimated articulatory features. Since the dynamics of articulation contain important information, we included inverted articulatory trajectories in text dependent speaker verification. We demonstrate that the articulatory constraints introduced by inverted articulatory features help to reject wrong password trials and improve the performance after score level fusion. We evaluate the proposed methods on the X-ray Microbeam database and the RSR 2015 database, respectively, for the aforementioned two tasks. Experimental results show that we achieve more than 15% relative equal error rate reduction for both speaker verification tasks.

show abstract

Section: Introductionmentioning

confidence: 99%

Speaker verification based on the fusion of speech acoustics and inverted articulatory signals

Kim

Lammert

et al. 2016

Computer Speech & Language

View full text Add to dashboard Cite

show abstract

“…In this system HMM segmentation is used to split a passphrase into individual digits. For each digit a State-GMM mean supervector is extracted as described in [7]. Each state is associated with a unique speaker-independent UBM, which is trained on the RSR2015 database training set.…”

Section: State-gmm-svmmentioning

confidence: 99%

“…The training set is also extended with Wells Fargo Bank dataset (WF), described in [7], which contains short digit passphrase utterances, and the training part of STC-Russiandigits dataset.…”

Section: Databases Descriptionmentioning

confidence: 99%

Deep CNN Based Feature Extractor for Text-Prompted Speaker Recognition

Novoselov

Kudashev²,

Shchemelinin

et al. 2018

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

Deep learning is still not a very common tool in speaker verification field. We study deep convolutional neural network performance in the text-prompted speaker verification task. The prompted passphrase is segmented into word statesi.e. digits -to test each digit utterance separately. We train a single high-level feature extractor for all states and use cosine similarity metric for scoring. The key feature of our network is the Max-Feature-Map activation function, which acts as an embedded feature selector. By using multitask learning scheme to train the high-level feature extractor we were able to surpass the classic baseline systems in terms of quality and achieved impressive results for such a novice approach, getting 2.85% EER on the RSR2015 evaluation set. Fusion of the proposed and the baseline systems improves this result.

show abstract

“…All i-vectors are length normalized and further regularized using the phrase-dependent Within-class Covariance Normalization (WCCN). A simple cosine distance scoring is used followed by phrase-dependent s-norm score normalization [10]. 19 Mel-Frequency Cepstral Coefficients (MFCC) + log energy is used as the baseline features.…”

Section: Baselinementioning

confidence: 99%

On Residual CNN in Text-Dependent Speaker Verification Task

Malykh

Novoselov

Kudashev

2017

Speech and Computer

Self Cite

View full text Add to dashboard Cite

Abstract. Deep learning approaches are still not very common in the speaker verification field. We investigate the possibility of using deep residual convolutional neural network with spectrograms as an input features in the text-dependent speaker verification task. Despite the fact that we were not able to surpass the baseline system in quality, we achieved a quite good results for such a new approach getting an 5.23% ERR on the RSR2015 evaluation part. Fusion of the baseline and proposed systems outperformed the best individual system by 18% relatively.

show abstract

Text-dependent GMM-JFA system for password based speaker verification

Cited by 22 publications

References 6 publications

Speaker verification based on the fusion of speech acoustics and inverted articulatory signals

Speaker verification based on the fusion of speech acoustics and inverted articulatory signals

Deep CNN Based Feature Extractor for Text-Prompted Speaker Recognition

On Residual CNN in Text-Dependent Speaker Verification Task

Contact Info

Product

Resources

About