“…Arabic language identification for NLP is an important research effort, but few studies have been done in this area for both audio speech [20], [21], [22], [23], [24], and textual forms [25], [26]. Mel Frequency Cepstral Coefficient (MFCC) is commonly used to solve this problem [10], [11], [27], [28], and [29]. Heracleous et al [1] presented experiments for SLID by using Convolutional Neural Networks (CNN) and Deep Neural Networks (DNN) methods.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Sisodia et al [29] assessed some of the ensemble learning methods for SLID using MFCC and Delta Mel Frequency Cepstral Coefficient (DFCC) features. They recorded speech audio files using five languages: French, Dutch, English, German, and Portuguese.…”
Section: Literature Reviewmentioning
confidence: 99%
“…The Mel Frequency Cepstral Coefficient (MFCC) is a wellknown signal feature used extensively in spoken language identification [27], [29]. MFCC is based on the human peripheral auditory system.…”
Section: ) the Mel Frequency Cepstral Coefficientmentioning
Spoken Language Identification (SLID) is an important step in speech-to-speech translation systems and multi-lingual automatic speech recognition. In recent research, deep learning mechanisms have been the prevailing approaches for spoken language identification. This paper aims to study, detect, and analyze spoken languages similar to Arabic in pronouncing certain words and then proposes a deep learningbased architecture, specifically the Bidirectional Long Short Term Memory (BLSTM), for spoken Arabic language identification and discrimination between these similar languages, namely, German, Spanish, French, and Russian, all of which are taken from Mozilla speech corpus languages. Additionally, our work involves a linguistic study of these considered languages. A total of ten thousand speakers are chosen for all five languages, and the BLSTM architecture is designed and implemented using acoustic signal features and applied to five experiments in this paper. The results show a precision of 98.97%, 98.73%, 98.47%, and 99.75% for identifying the spoken Arabic language separately along with German, Spanish, French, and Russian, respectively. Additionally, we achieved an average accuracy of 95.15% for discriminating between all these considered five languages in terms of the pronunciation of words. Our findings confirm that a BLSTM architecture is able to distinguish between observable similar pronunciations of words in considered languages.
“…Arabic language identification for NLP is an important research effort, but few studies have been done in this area for both audio speech [20], [21], [22], [23], [24], and textual forms [25], [26]. Mel Frequency Cepstral Coefficient (MFCC) is commonly used to solve this problem [10], [11], [27], [28], and [29]. Heracleous et al [1] presented experiments for SLID by using Convolutional Neural Networks (CNN) and Deep Neural Networks (DNN) methods.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Sisodia et al [29] assessed some of the ensemble learning methods for SLID using MFCC and Delta Mel Frequency Cepstral Coefficient (DFCC) features. They recorded speech audio files using five languages: French, Dutch, English, German, and Portuguese.…”
Section: Literature Reviewmentioning
confidence: 99%
“…The Mel Frequency Cepstral Coefficient (MFCC) is a wellknown signal feature used extensively in spoken language identification [27], [29]. MFCC is based on the human peripheral auditory system.…”
Section: ) the Mel Frequency Cepstral Coefficientmentioning
Spoken Language Identification (SLID) is an important step in speech-to-speech translation systems and multi-lingual automatic speech recognition. In recent research, deep learning mechanisms have been the prevailing approaches for spoken language identification. This paper aims to study, detect, and analyze spoken languages similar to Arabic in pronouncing certain words and then proposes a deep learningbased architecture, specifically the Bidirectional Long Short Term Memory (BLSTM), for spoken Arabic language identification and discrimination between these similar languages, namely, German, Spanish, French, and Russian, all of which are taken from Mozilla speech corpus languages. Additionally, our work involves a linguistic study of these considered languages. A total of ten thousand speakers are chosen for all five languages, and the BLSTM architecture is designed and implemented using acoustic signal features and applied to five experiments in this paper. The results show a precision of 98.97%, 98.73%, 98.47%, and 99.75% for identifying the spoken Arabic language separately along with German, Spanish, French, and Russian, respectively. Additionally, we achieved an average accuracy of 95.15% for discriminating between all these considered five languages in terms of the pronunciation of words. Our findings confirm that a BLSTM architecture is able to distinguish between observable similar pronunciations of words in considered languages.
“…As a backup in the event that the proposed method's acoustic systems fail to properly identify the language of incoming speech with high confidence, a phonetic system is utilised, with the scores from both systems being pooled. Dilip Singh Sisodia et al, [6] Using senones as targets, they illustrated and investigated the key downsides of expanding the output layer size, with a particular focus on the size of the input temporal context, in addition to the benefits. As reported in that paper, a baseline monolingual feature trained on a language that was similar to the test language performed better than multilingually trained features.…”
Section: Related Workmentioning
confidence: 99%
“…Step 5: Build Feature Map The datasets in [1,4,6] are derived from the Indic speech corpus of the International Institute of Information Technology, Hyderabad (IIIT-H), which includes 1000 spoken sentences in each of seven different languages. Thus, they have utilized a total of 7000 audio samples in our language detection model.…”
In Western countries, speech-recognition applications are accepted. In East Asia, it isn't as common. The complexity of the language might be one of the main reasons for this latency. Furthermore, multilingual nations such as India must be considered in order to achieve language recognition (words and phrases) utilizing speech signals. In the last decade, experts have been clamoring for more study on speech. In the initial part of the pre-processing step, a pitch and audio feature extraction technique were used, followed by a deep learning classification method, to properly identify the spoken language. Various feature extraction approaches will be discussed in this review, along with their advantages and disadvantages. Purpose of this research is to Learn transfer learning approaches like Alexnet, VGGNet, and ResNet & CNN etc. using CNN model we got best accuracy for Language Recognition.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.