Prosody-based Spoken Algerian Arabic Dialect Identification

Bougrine, Soumia; Cherroun, Hadda; Ziadi, Djelloul

doi:10.1016/j.procs.2018.03.002

Cited by 10 publications

(2 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In Sadat et al (2014), the authors present a bi-gram character-level model to identify the dialect of sentences, in the social media context, among dialects of 18 Arab countries. Bougrine et al (2015) addressed the problem of spoken Algerian dialect identification by using prosodic speech information (intonation and rhythm). They performed an experiment on six dialects from different Algerian regions.…”

Section: Related Workmentioning

confidence: 99%

The SMarT Classifier for Arabic Fine-Grained Dialect Identification

Meftouh¹,

Abidi²,

Harrat³

et al. 2019

Proceedings of the Fourth Arabic Natural Language Processing Workshop

View full text Add to dashboard Cite

This paper describes the approach adopted by the SMarT research group to build a dialect identification system in the framework of the Madar shared task on Arabic fine-grained dialect identification. We experimented several approaches, but we finally decided to use a Multinomial Naïve Bayes classifier based on word and character ngrams in addition to the language model probabilities. We achieved a score of 67.73% in terms of Macro accuracy and a macro-averaged F1-score of 67.31%.

show abstract

Section: Related Workmentioning

confidence: 99%

The SMarT Classifier for Arabic Fine-Grained Dialect Identification

Meftouh¹,

Abidi²,

Harrat³

et al. 2019

Proceedings of the Fourth Arabic Natural Language Processing Workshop

View full text Add to dashboard Cite

show abstract

“…Studies by Hansen and Liu (2016) have shown that acoustic variations are more prominent than the linguistic variations [acoustic models performed better than linguistic models by 15.8% absolute unweighted average recall (UAR)] for major dialects of English. The acoustic variations among dialects include segmental and supra-segmental features, and they can be extracted directly from the speech signal (Behravan et al, 2016;Bougrine et al, 2018;DeMarco and Cox, 2012;Rajpal et al, 2016;Rouas, 2007) or they can be modelled indirectly from the phonetic information derived from the speech signal (Chen et al, 2011;Chen et al, 2014;Najafian et al, 2018;Shon et al, 2018a).…”

Section: Introductionmentioning

confidence: 99%

Deep neural architectures for dialect classification with single frequency filtering and zero-time windowing feature representations

Kethireddy

Kadiri

Gangashetty

2022

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

The goal of this study is to investigate advanced signal processing approaches [single frequency filtering (SFF) and zero-time windowing (ZTW)] with modern deep neural networks (DNNs) [convolution neural networks (CNNs), temporal convolution neural networks (TCN), time-delay neural network (TDNN), and emphasized channel attention, propagation and aggregation in TDNN (ECAPA-TDNN)] for dialect classification of major dialects of English. Previous studies indicated that SFF and ZTW methods provide higher spectro-temporal resolution. To capture the intrinsic variations in articulations among dialects, four feature representations [spectrogram (SPEC), cepstral coefficients, mel filter-bank energies, and mel-frequency cepstral coefficients (MFCCs)] are derived from SFF and ZTW methods. Experiments with and without data augmentation using CNN classifiers revealed that the proposed features performed better than baseline short-time Fourier transform (STFT)-based features on the UT-Podcast database [Hansen, J. H., and Liu, G. (2016). "Unsupervised accent classification for deep data fusion of accent and language information," Speech Commun. 78, 19-33]. Even without data augmentation, all the proposed features showed an approximate improvement of 15%-20% (relative) over best baseline (SPEC-STFT) feature. TCN, TDNN, and ECAPA-TDNN classifiers that capture wider temporal context further improved the performance for many of the proposed and baseline features. Among all the baseline and proposed features, the best performance is achieved with single frequency filtered cepstral coefficients for TCN (81.30%), TDNN (81.53%), and ECAPA-TDNN (85.48%). An investigation of data-driven filters, instead of fixed mel-scale, improved the performance by 2.8% and 1.4% (relatively) for SPEC-STFT and SPEC-SFF, and nearly equal for SPEC-ZTW. To assist related work, we have made the code available ([Kethireddy, R., and Kadiri, S. R. (2022). "Deep neural architectures for dialect classification with single frequency filtering and zero-time windowing feature representations," https://github.com/r39ashmi/e2e_dialect (Last viewed

show abstract