2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2017
DOI: 10.1109/asru.2017.8268995
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of multiple features and modeling methods for text-dependent speaker verification

Abstract: Text-dependent speaker verification is becoming popular in the speaker recognition society. However, the conventional i-vector framework which has been successful for speaker identification and other similar tasks works relatively poorly in this task. Researchers have proposed several new methods to improve performance, but it is still unclear that which model is the best choice, especially when the pass-phrases are prompted during enrollment and test. In this paper, we introduce four modeling methods and comp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 12 publications
(2 citation statements)
references
References 33 publications
0
2
0
Order By: Relevance
“…The FB alignment, which can be seen as a soft version of the Viterbi algorithm, computes the posterior from forward and backward probabilities [14]. In [15], it was shown that these two types of alignments result in similar performance. To better compare the alignments generated by HMM and DNN, the soft FB alignment is used in our experiments.…”
Section: The Role Of Hmmmentioning
confidence: 99%
“…The FB alignment, which can be seen as a soft version of the Viterbi algorithm, computes the posterior from forward and backward probabilities [14]. In [15], it was shown that these two types of alignments result in similar performance. To better compare the alignments generated by HMM and DNN, the soft FB alignment is used in our experiments.…”
Section: The Role Of Hmmmentioning
confidence: 99%
“…In text-dependent SV, utterance verification can be seen as a subtask. A comparison of different features and modeling techniques for text-dependent SV can be found in [24]. Joint speaker and utterance models with HMM triphone models are used in [25].…”
Section: Introductionmentioning
confidence: 99%