Lexicon-Based Local Representation for Text-Dependent Speaker Verification

You, Hanxu; Li, Lianqiang; Zhu, Jie

doi:10.1587/transinf.2016edl8182

Cited by 4 publications

(3 citation statements)

References 6 publications

(6 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Furthermore, as we expected, the general performance is worse since the system suffers from the lexical similarity of the short commands. Thus, this part is more challenging than Part I as we can also see in other previous works [15,16].…”

Section: Experiments With Rsr-part IImentioning

confidence: 80%

“…The application of DNNs and the same techniques as in text-independent models for text-dependent speaker verification tasks has produced mixed results. On the one hand, specific modifications of the traditional techniques have been shown to be successful for text-dependent tasks such as i-vector+PLDA/Support Vector Machines (SVM) [14][15][16], DNNs bottleneck as features for i-vector extractors [17] or posterior probabilities for i-vector extractors [17,18]. On the other hand, speaker embeddings obtained directly from a DNN have provided good results in tasks with large amounts of data and a single phrase [19], but they have not been as effective in tasks with more than one pass phrase and smaller database sizes [4,5].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Supervector Extraction for Encoding Speaker and Phrase Information with Neural Networks for Text-Dependent Speaker Verification

et al. 2019

View full text Add to dashboard Cite

In this paper, we propose a new differentiable neural network with an alignment mechanism for text-dependent speaker verification. Unlike previous works, we do not extract the embedding of an utterance from the global average pooling of the temporal dimension. Our system replaces this reduction mechanism by a phonetic phrase alignment model to keep the temporal structure of each phrase since the phonetic information is relevant in the verification task. Moreover, we can apply a convolutional neural network as front-end, and, thanks to the alignment process being differentiable, we can train the network to produce a supervector for each utterance that will be discriminative to the speaker and the phrase simultaneously. This choice has the advantage that the supervector encodes the phrase and speaker information providing good performance in text-dependent speaker verification tasks. The verification process is performed using a basic similarity metric. The new model using alignment to produce supervectors was evaluated on the RSR2015-Part I database, providing competitive results compared to similar size networks that make use of the global average pooling to extract embeddings. Furthermore, we also evaluated this proposal on the RSR2015-Part II. To our knowledge, this system achieves the best published results obtained on this second part.

show abstract

Section: Experiments With Rsr-part IImentioning

confidence: 80%

Section: Introductionmentioning

confidence: 99%

Supervector Extraction for Encoding Speaker and Phrase Information with Neural Networks for Text-Dependent Speaker Verification

et al. 2019

View full text Add to dashboard Cite

show abstract

“…The verification performance of 1.22% equal error rate (EER) is achieved. A lexicon-based local representation algorithm for text-dependent i-vector speaker verification system is presented in [18].The speaker recognition system based on Gaussian mixer model-based support vector machine (GMM-SVM) and the nuisance attribute projection (NAP) technique for channel compensation is presented in [19]. Time alignment of different utterances is a serious problem for distance measures and small shift would lead to incorrect identification in text-dependent speaker recognition.…”

Section: Review Of Related Workmentioning

confidence: 99%