2013
DOI: 10.1016/j.specom.2012.08.007
|View full text |Cite
|
Sign up to set email alerts
|

Multitaper MFCC and PLP features for speaker verification using i-vectors

Abstract: In this paper we study the performance of the low-variance multi-taper Mel-frequency cepstral coefficient (MFCC) and perceptual linear prediction (PLP) features in a state-ofthe-art i-vector speaker verification system. The MFCC and PLP features are usually computed from a Hamming-windowed periodogram spectrum estimate. Such a singletapered spectrum estimate has large variance, which can be reduced by averaging spectral estimates obtained using a set of different tapers, leading to a so-called multitaper spect… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
37
0

Year Published

2013
2013
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 80 publications
(38 citation statements)
references
References 30 publications
1
37
0
Order By: Relevance
“…More details about these three tapers can also be found in [10]. In this paper, we use the Thomson multitaper [7], the SWCE [14], and the Multipeak multitaper spectrum estimator [16] to compute the low variance MFCC and PLP features for an emotion recognition system.…”
Section: Mt ŝmentioning
confidence: 99%
See 3 more Smart Citations
“…More details about these three tapers can also be found in [10]. In this paper, we use the Thomson multitaper [7], the SWCE [14], and the Multipeak multitaper spectrum estimator [16] to compute the low variance MFCC and PLP features for an emotion recognition system.…”
Section: Mt ŝmentioning
confidence: 99%
“…For this study, we take the number of tapers, M, equal to 6 because it is found that this number optimizes the performance for speech recognition [12] and speaker verification problems, as reported in [10,11].…”
Section: Multitaper Featuresmentioning
confidence: 99%
See 2 more Smart Citations
“…State-of-the-art systems build on top of this basic framework by concatenating the GMM mean vectors to form speaker-specific supervectors, typically classified using Support Vector Machines (SVM) [7] or processed using factor analysis [10] or I-vector analysis [9,26,1].…”
Section: Introductionmentioning
confidence: 99%