Performance Analysis of Thewarped Discrete Cosine Transform Cepstrum with MFCC Using Different Classifiers

Sangwan, Abhijeet; Muralishankar, R.; O’Shaughnessy, D.

doi:10.1109/mlsp.2005.1532882

Cited by 4 publications

(8 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Statistical analysis of feature sets, MFCCs and 3PR, is performed to understand the differentiative ability of the feature sets for seven aGender classes. Since MFCC features are studied extensively in literature [41][42][43] we do not include its low order moments, mean and standard deviation, and only include its high order moments analysis, skewness and kurtosis here. For 3PR set, both low-order and high-order moments are included.…”

Section: Feature Extractionmentioning

confidence: 99%

A new pitch-range based feature set for a speaker’s age and gender classification

Barkana

Zhou

2015

Applied Acoustics

View full text Add to dashboard Cite

Section: Feature Extractionmentioning

confidence: 99%

A new pitch-range based feature set for a speaker’s age and gender classification

Barkana

Zhou

2015

Applied Acoustics

View full text Add to dashboard Cite

“…This may be attributed to its dynamic range being the least (see Fig. 4), significant spin-offs of which are low cepstral variance, tighter clusters and good separability [4,5].…”

Section: Resultsmentioning

confidence: 99%

“…Moreover, for each of the three models the dynamic range of the warped versions is smaller than that of the corresponding unwarped versions. Although the MFCC-based algorithm outperforms the WDCTC-based algorithm by 2 to 3% in terms of the recognition rate under zero-noise conditions, the latter has been shown to outperform the former under noisy conditions with respect to the same criterion [3][4][5].…”

Section: Resultsmentioning

confidence: 99%

“…Its enhanced performance over the mel-frequency cepstral coefficients (MFCC) in a vowel recognition and speaker-identification task has been highlighted there. Some significant findings about the WD-CTC such as (i) good vowel class separability, (ii) low variance, (iii) good codebook representation, (iv) robustness to noise, and (v) better approximation towards a Gaussian distribution has been reported [4,5]. Moreover, MFCC has proved to be difficult to analyze [6].…”

mentioning

confidence: 99%

See 1 more Smart Citation

A Performance Analysis of Features from Complex Cepstra of Warped DST, DCT and DHT Filters for Phoneme Recognition

Muralishankar

Shankar

O’Shaughnessy

2007

2007 15th International Conference on Digital Signal Processing

Self Cite

View full text Add to dashboard Cite

An analytical model has been developed for the warped discrete Hartley transform cepstrum(WDHTC) in a recent work [1]. Along similar lines, the warped discrete cosine transform (WDCT) has since been modelled in a companion paper [2]. These were preceded by empirical studies of the WDCT cepstrum (WDCTC) as applied to speech feature extraction for vowel recognition and speaker identification [3]. In this paper, we derive the theoretical complex cepstrum (TCC) based on the warped discrete sine transform. We argue that the common recipe evolved through these papers may be used as a measure to compare analytically deducible frontend speech recognition schemes. In particular, we show that the WDCTC-based scheme outperforms the present warped discrete sine transform cepstrum (WDSTC)-based scheme and the one based on warped discrete Hartley transform in terms of low variance of features due to reduced spectral dynamic range. Phoneme recognition performance of WDCTC,WDHTC and WDSTC corroborate well with our analytical findings. INTRODUCTIONIn a vowel recognition experiment, the warped discrete cosine transform cepstrum (WDCTC) has been proposed as a feature [3]. Its enhanced performance over the mel-frequency cepstral coefficients (MFCC) in a vowel recognition and speaker-identification task has been highlighted there. Some significant findings about the WD-CTC such as (i) good vowel class separability, (ii) low variance, (iii) good codebook representation, (iv) robustness to noise, and (v) better approximation towards a Gaussian distribution has been reported [4,5]. Moreover, MFCC has proved to be difficult to analyze [6]. The use of WDCT in computing the WDCTC gives a significant advantage in analysis [2]. A preliminary comparison between the analytical models of the warped discrete Hartley transform (WDHTC)-based and WDCTC-based schemes has been reported [1].The strategy we adopt in this paper is founded on an appraisal that it is desirable to have a platform to compare algorithms based on different transforms. Such a platform may be built by validating the analytically developed models vis-a-vis recognition experiments on the TIMIT database. It will then lend credence to rank order the schemes based on different transforms as we will be in a position to compare across the analytical models of the schemes themselves. A warped discrete sine transform cepstrum (WDSTC)-based frontend extractor is first proposed and modelled analytically in this paper. As a next logical step forward, we compare the WDHTC-, WDCTC-and WDSTC-based analytical models.Our approach here towards deriving the theoretical complex cepstrum (TCC) of the WDST is similar to the methodology in [1,2]. We sketch relevant details of the approach there, placing

show abstract

“…It is within the domain of robust features that we had developed and introduced in the warped discrete cosine transform cepstrum (WDCTC) [9]. We had benchmarked the new feature against the popular Mel-frequency cepstral coefficients (MFCC) in terms of its statistical properties and performance in simple recognition tasks [13]. Further, a new feature representation called the Perceptual-MVDR (PMVDR) [18] has been proposed by Yapanel et.…”

Section: Introductionmentioning

confidence: 99%

A Comparative Analysis of Noise Robust Speech Features Extracted from All-Pass Based Warping with MFCC in a Noisy Phoneme Recognition

Muralishankar

O’Shaughnessy²

2008

2008 the Third International Conference on Digital Telecommunications (Icdt 2008)

Self Cite

View full text Add to dashboard Cite

In this paper, we investigate the noise robustness of three features, namely, the warped discrete Fourier transform cepstrum (WDFTC), perceptual minimum variance distortionless response (PMVDR) and Mel-frequency cepstral coefficients (MFCC). Here, WDFTC and PMVDR features are generated by adopting all-pass based warping and for the MFCC, we know that spectral warping is generally employed. The PMVDR and WDFTC use warped-LP and warped discrete Fourier transforms, respectively. Particularly, we employ the WDFTC, PMVDR and MFCC features in a continuous noisy monophone recognition task using the TIMIT corpus and a wide variety of acoustical noise types at different SNRs (signal-to-noise ratios). Further, we test these features on a gender-specific monophone recognition task. Finally, we report the recognition performance and discuss many interesting properties of these features. Our study shows that the PMVDR and WDFTC achieve recognition performance superior to the MFCC in noisy conditions.

show abstract

Performance Analysis of Thewarped Discrete Cosine Transform Cepstrum with MFCC Using Different Classifiers

Cited by 4 publications

References 11 publications

A new pitch-range based feature set for a speaker’s age and gender classification

A new pitch-range based feature set for a speaker’s age and gender classification

A Performance Analysis of Features from Complex Cepstra of Warped DST, DCT and DHT Filters for Phoneme Recognition

A Comparative Analysis of Noise Robust Speech Features Extracted from All-Pass Based Warping with MFCC in a Noisy Phoneme Recognition

Contact Info

Product

Resources

About