The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
The Speaker and Language Recognition Workshop (Odyssey 2020) 2020
DOI: 10.21437/odyssey.2020-5
|View full text |Cite
|
Sign up to set email alerts
|

Zero-Time Windowing Cepstral Coefficients for Dialect Classification

Abstract: In this paper, we propose to use novel acoustic features, namely zero-time windowing cepstral coefficients (ZTWCC) for dialect classification. ZTWCC features are derived from high resolution spectrum obtained with zero-time windowing (ZTW) method, and were shown to be useful for discriminating speech sound characteristics effectively as compared to a DFT spectrum. Our proposed system is based on i-vectors trained on static and shifted delta coefficients of ZTWCC. The i-vectors are further whitened before class… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 17 publications
(29 reference statements)
0
5
0
Order By: Relevance
“…In the future, we plan to explore the Mel-SFF spectrogram derived features for dialect identification in noisy conditions [27], [28], [30] and for larger corpora. Further, we plan to investigate the complementary information between the SFF based spectrograms and zero-time windowing (ZTW) based spectrograms, which were shown to give better performance over STFT [42]- [44].…”
Section: Discussionmentioning
confidence: 99%
“…In the future, we plan to explore the Mel-SFF spectrogram derived features for dialect identification in noisy conditions [27], [28], [30] and for larger corpora. Further, we plan to investigate the complementary information between the SFF based spectrograms and zero-time windowing (ZTW) based spectrograms, which were shown to give better performance over STFT [42]- [44].…”
Section: Discussionmentioning
confidence: 99%
“…The experimental results demonstrate strong performance with an accuracy of 81.26%; these results were attained by using their techniques on a common dataset of 8 English accents. Similar works include [23,24,25,26].…”
Section: Related Workmentioning
confidence: 99%
“…Many feature sets have been proposed with statistical and deep learning-based classifiers. A few widely used feature sets are as follows: Mel frequency cepstrum coefficients (MFCCs); inverse MFCCs (IMFCCs) [ 15 ]; linear frequency cepstrum coefficients (LFCCs); constant Q cepstrum coefficients (CQCCs) [ 16 ]; log-power spectrum using discrete Fourier transform (DFT) [ 17 ]; Gammatonegram, group delay over the frame, referred to as GD-gram [ 18 ]; modified group delay; All-Pole Group Delay [ 19 ]; Cochlear Filter Cepstral Coefficient—Instantaneous Frequency [ 20 ]; cepstrum coefficients using single-frequency filtering [ 21 , 22 ]; Zero-Time Windowing (ZTW) [ 23 ]; Mel-frequency cepstrum using ZTW [ 24 ]; and polyphase IIR filters [ 25 ]. The human ear uses Fourier transform magnitude and neglects the phase information [ 26 ].…”
Section: Related Workmentioning
confidence: 99%