Ensemble Learners for Identification of Spoken Languages using Mel Frequency Cepstral Coefficients

Sisodia, Dilip Singh; Nikhil, Saragadam; Kiran, G. S. S. Raj; Sathvik, P.

doi:10.1109/idea49133.2020.9170720

Cited by 14 publications

(8 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Arabic language identification for NLP is an important research effort, but few studies have been done in this area for both audio speech [20], [21], [22], [23], [24], and textual forms [25], [26]. Mel Frequency Cepstral Coefficient (MFCC) is commonly used to solve this problem [10], [11], [27], [28], and [29]. Heracleous et al [1] presented experiments for SLID by using Convolutional Neural Networks (CNN) and Deep Neural Networks (DNN) methods.…”

Section: Literature Reviewmentioning

confidence: 99%

“…Sisodia et al [29] assessed some of the ensemble learning methods for SLID using MFCC and Delta Mel Frequency Cepstral Coefficient (DFCC) features. They recorded speech audio files using five languages: French, Dutch, English, German, and Portuguese.…”

Section: Literature Reviewmentioning

confidence: 99%

“…The Mel Frequency Cepstral Coefficient (MFCC) is a wellknown signal feature used extensively in spoken language identification [27], [29]. MFCC is based on the human peripheral auditory system.…”

Section: ) the Mel Frequency Cepstral Coefficientmentioning

confidence: 99%

See 2 more Smart Citations

A Deep Learning Approach for Identifying and Discriminating Spoken Arabic Among Other Languages

Alashban

Alotaibi

2023

IEEE Access

View full text Add to dashboard Cite

Spoken Language Identification (SLID) is an important step in speech-to-speech translation systems and multi-lingual automatic speech recognition. In recent research, deep learning mechanisms have been the prevailing approaches for spoken language identification. This paper aims to study, detect, and analyze spoken languages similar to Arabic in pronouncing certain words and then proposes a deep learningbased architecture, specifically the Bidirectional Long Short Term Memory (BLSTM), for spoken Arabic language identification and discrimination between these similar languages, namely, German, Spanish, French, and Russian, all of which are taken from Mozilla speech corpus languages. Additionally, our work involves a linguistic study of these considered languages. A total of ten thousand speakers are chosen for all five languages, and the BLSTM architecture is designed and implemented using acoustic signal features and applied to five experiments in this paper. The results show a precision of 98.97%, 98.73%, 98.47%, and 99.75% for identifying the spoken Arabic language separately along with German, Spanish, French, and Russian, respectively. Additionally, we achieved an average accuracy of 95.15% for discriminating between all these considered five languages in terms of the pronunciation of words. Our findings confirm that a BLSTM architecture is able to distinguish between observable similar pronunciations of words in considered languages.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

Section: Literature Reviewmentioning

confidence: 99%

See 1 more Smart Citation

A Deep Learning Approach for Identifying and Discriminating Spoken Arabic Among Other Languages

Alashban

Alotaibi

2023

IEEE Access

View full text Add to dashboard Cite

show abstract

“…As a backup in the event that the proposed method's acoustic systems fail to properly identify the language of incoming speech with high confidence, a phonetic system is utilised, with the scores from both systems being pooled. Dilip Singh Sisodia et al, [6] Using senones as targets, they illustrated and investigated the key downsides of expanding the output layer size, with a particular focus on the size of the input temporal context, in addition to the benefits. As reported in that paper, a baseline monolingual feature trained on a language that was similar to the test language performed better than multilingually trained features.…”

Section: Related Workmentioning

confidence: 99%

“…Step 5: Build Feature Map The datasets in [1,4,6] are derived from the Indic speech corpus of the International Institute of Information Technology, Hyderabad (IIIT-H), which includes 1000 spoken sentences in each of seven different languages. Thus, they have utilized a total of 7000 audio samples in our language detection model.…”

Section: Related Workmentioning

confidence: 99%

Spoken Language Recognization Based on Features and Classification Methods

Bam

Degadwala²,

Upadhyay

et al. 2022

IJSRCSEIT

View full text Add to dashboard Cite

In Western countries, speech-recognition applications are accepted. In East Asia, it isn't as common. The complexity of the language might be one of the main reasons for this latency. Furthermore, multilingual nations such as India must be considered in order to achieve language recognition (words and phrases) utilizing speech signals. In the last decade, experts have been clamoring for more study on speech. In the initial part of the pre-processing step, a pitch and audio feature extraction technique were used, followed by a deep learning classification method, to properly identify the spoken language. Various feature extraction approaches will be discussed in this review, along with their advantages and disadvantages. Purpose of this research is to Learn transfer learning approaches like Alexnet, VGGNet, and ResNet & CNN etc. using CNN model we got best accuracy for Language Recognition.

show abstract

Automated Spoken Language Identification Using Convolutional Neural Networks & Spectrograms

Shrawgi

Sisodia

Gupta

2023

Key Digital Trends Shaping the Future of Information and Management Science

View full text Add to dashboard Cite

Ensemble Learners for Identification of Spoken Languages using Mel Frequency Cepstral Coefficients

Cited by 14 publications

References 20 publications

A Deep Learning Approach for Identifying and Discriminating Spoken Arabic Among Other Languages

A Deep Learning Approach for Identifying and Discriminating Spoken Arabic Among Other Languages

Spoken Language Recognization Based on Features and Classification Methods

Automated Spoken Language Identification Using Convolutional Neural Networks & Spectrograms

Contact Info

Product

Resources

About