Abstract:In this paper, we propose to use novel acoustic features, namely zero-time windowing cepstral coefficients (ZTWCC) for dialect classification. ZTWCC features are derived from high resolution spectrum obtained with zero-time windowing (ZTW) method, and were shown to be useful for discriminating speech sound characteristics effectively as compared to a DFT spectrum. Our proposed system is based on i-vectors trained on static and shifted delta coefficients of ZTWCC. The i-vectors are further whitened before class… Show more
“…In the future, we plan to explore the Mel-SFF spectrogram derived features for dialect identification in noisy conditions [27], [28], [30] and for larger corpora. Further, we plan to investigate the complementary information between the SFF based spectrograms and zero-time windowing (ZTW) based spectrograms, which were shown to give better performance over STFT [42]- [44].…”
The second author would like to thank the Academy of Finland (Projects 312490 and 330139) for supporting his stay in Finland as a Postdoctoral researcher.
“…In the future, we plan to explore the Mel-SFF spectrogram derived features for dialect identification in noisy conditions [27], [28], [30] and for larger corpora. Further, we plan to investigate the complementary information between the SFF based spectrograms and zero-time windowing (ZTW) based spectrograms, which were shown to give better performance over STFT [42]- [44].…”
The second author would like to thank the Academy of Finland (Projects 312490 and 330139) for supporting his stay in Finland as a Postdoctoral researcher.
“…The experimental results demonstrate strong performance with an accuracy of 81.26%; these results were attained by using their techniques on a common dataset of 8 English accents. Similar works include [23,24,25,26].…”
Accents, or changes in how different people speak the same word/sentence in the same language, pose substantial communication issues in most spoken languages. This is a well-known fact, but how does the accent of one language affect learning/speaking another? In this paper, we look at how Arab accents influence the English language. To that end, we built a deep machine-learning system for Arabic accent recognition that was learned from an in-house English speech database of four Arabic accents collected from Jordan, Iraq, Saudi Arabia, and Tunisia. The proposed system employs Mel spectrograms of an English-spoken paragraph to train an LSTM neural network to recognize the accent in each sound signal. Although the collected data was extremely difficult to learn due to the presence of both males and females and fluent speakers in each class, the proposed system could recognize speakers with various accents by up to 79%. This answers the study's main question, demonstrating that speakers with an Arabic accent have their way of speaking English, which varies by country. As a result, if trained on appropriate and adequate data, the proposed system can also be used to recognize accents in any language.
“…Many feature sets have been proposed with statistical and deep learning-based classifiers. A few widely used feature sets are as follows: Mel frequency cepstrum coefficients (MFCCs); inverse MFCCs (IMFCCs) [ 15 ]; linear frequency cepstrum coefficients (LFCCs); constant Q cepstrum coefficients (CQCCs) [ 16 ]; log-power spectrum using discrete Fourier transform (DFT) [ 17 ]; Gammatonegram, group delay over the frame, referred to as GD-gram [ 18 ]; modified group delay; All-Pole Group Delay [ 19 ]; Cochlear Filter Cepstral Coefficient—Instantaneous Frequency [ 20 ]; cepstrum coefficients using single-frequency filtering [ 21 , 22 ]; Zero-Time Windowing (ZTW) [ 23 ]; Mel-frequency cepstrum using ZTW [ 24 ]; and polyphase IIR filters [ 25 ]. The human ear uses Fourier transform magnitude and neglects the phase information [ 26 ].…”
Voice-controlled devices are in demand due to their hands-free controls. However, using voice-controlled devices in sensitive scenarios like smartphone applications and financial transactions requires protection against fraudulent attacks referred to as “speech spoofing”. The algorithms used in spoof attacks are practically unknown; hence, further analysis and development of spoof-detection models for improving spoof classification are required. A study of the spoofed-speech spectrum suggests that high-frequency features are able to discriminate genuine speech from spoofed speech well. Typically, linear or triangular filter banks are used to obtain high-frequency features. However, a Gaussian filter can extract more global information than a triangular filter. In addition, MFCC features are preferable among other speech features because of their lower covariance. Therefore, in this study, the use of a Gaussian filter is proposed for the extraction of inverted MFCC (iMFCC) features, providing high-frequency features. Complementary features are integrated with iMFCC to strengthen the features that aid in the discrimination of spoof speech. Deep learning has been proven to be efficient in classification applications, but the selection of its hyper-parameters and architecture is crucial and directly affects performance. Therefore, a Bayesian algorithm is used to optimize the BiLSTM network. Thus, in this study, we build a high-frequency-based optimized BiLSTM network to classify the spoofed-speech signal, and we present an extensive investigation using the ASVSpoof 2017 dataset. The optimized BiLSTM model is successfully trained with the least epoch and achieved a 99.58% validation accuracy. The proposed algorithm achieved a 6.58% EER on the evaluation dataset, with a relative improvement of 78% on a baseline spoof-identification system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.