Spoken Language Identification System Using Convolutional Recurrent Neural Network

Alashban, Adal A.; Qamhan, Mustafa A.; Meftah, Ali H.; Alotaibi, Yousef Ajami

doi:10.3390/app12189181

Cited by 18 publications

(32 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most studies on speech recognition use feature types such as MFCC, GFCC, spectrogram, spectral characteristics, PLP, and LPC [11][12][13][14]. However, the most recent methods include the use of joint factor analysis (JFA) and i-vector-based methods [15][16][17].…”

Section: Related Workmentioning

confidence: 99%

“…Gammatone Filter Bank Cepstral Coefficients (GTCC) are obtained by replacing the triangular filter bank in MFCC with a set of Gammatone filters emphasizing different frequency bands [20,21]. Compared to MFCC, GTCC is reported to be more robust to noise [14]. Prosodic and spectro-temporal features comprise pitch, energy, duration, rhythm, and temporal features.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Speaker identification using hybrid subspace, deep learning and machine learning classifiers

KESER,

GEZER

2024

Preprint

View full text Add to dashboard Cite

Speaker identification is crucial in many application areas, such as automation, security, and user experience. This study examines the use of traditional classification algorithms and hybrid algorithms, as well as newly developed subspace classifiers, in the field of speaker identification. In the study, six different feature structures were tested for the various classifier algorithms. Stacked Features-Common Vector Approach (SF-CVA) and Hybrid CVA-FLDA (HCF) subspace classifiers are used for the first time in the literature for speaker identification. In addition, CVA is evaluated for the first time for speaker recognition using hybrid deep learning algorithms. This paper is also aimed at increasing accuracy rates with different hybrid algorithms. The study includes Recurrent Neural Network-Long Short-Term Memory (RNN-LSTM), i-vector + PLDA, Time Delayed Neural Network (TDNN), AutoEncoder + Softmax (AE + Softmaxx), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Common Vector Approach (CVA), SF-CVA, HCF, and Alexnet classifiers for speaker identification. The six different feature extraction approaches consist of Mel Frequency Cepstral Coefficients (MFCC) + Pitch, Gammatone Cepstral Coefficients (GTCC) + Pitch, MFCC + GTCC + Pitch + eight spectral features, spectrograms,i-vectors, and Alexnet feature vectors. For SF-CVA, 100% accuracy was achieved in most tests by combining the training and test feature vectors of the speakers separately. RNN-LSTM, i-vector + KNN, AE + softmax, TDNN, and i-vector + HCF classifiers gave the highest accuracy rates in the tests performed without combining training and test feature vectors.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Speaker identification using hybrid subspace, deep learning and machine learning classifiers

KESER,

GEZER

2024

Preprint

View full text Add to dashboard Cite

show abstract

“…There is no standard technique that can serve as the gold standard for discriminating between different languages. Additionally, the study of the possible similarities and dissimilarities between Arabic and other languages is urgently needed to improve spoken language identification [14].…”

Section: Literature Reviewmentioning

confidence: 99%

A Deep Learning Approach for Identifying and Discriminating Spoken Arabic Among Other Languages

Alashban

Alotaibi

2023

IEEE Access

Self Cite

View full text Add to dashboard Cite

Spoken Language Identification (SLID) is an important step in speech-to-speech translation systems and multi-lingual automatic speech recognition. In recent research, deep learning mechanisms have been the prevailing approaches for spoken language identification. This paper aims to study, detect, and analyze spoken languages similar to Arabic in pronouncing certain words and then proposes a deep learningbased architecture, specifically the Bidirectional Long Short Term Memory (BLSTM), for spoken Arabic language identification and discrimination between these similar languages, namely, German, Spanish, French, and Russian, all of which are taken from Mozilla speech corpus languages. Additionally, our work involves a linguistic study of these considered languages. A total of ten thousand speakers are chosen for all five languages, and the BLSTM architecture is designed and implemented using acoustic signal features and applied to five experiments in this paper. The results show a precision of 98.97%, 98.73%, 98.47%, and 99.75% for identifying the spoken Arabic language separately along with German, Spanish, French, and Russian, respectively. Additionally, we achieved an average accuracy of 95.15% for discriminating between all these considered five languages in terms of the pronunciation of words. Our findings confirm that a BLSTM architecture is able to distinguish between observable similar pronunciations of words in considered languages.

show abstract

“…The above equation can be described as scalar product among the log spectral energy vector and a vector of weighting factors W F l as in Eq. (7).…”

Section: Feature Extraction Techniquesmentioning

confidence: 99%

“…Ladakhi etc. In 2022, Alashban et al (7) proposed a spoken language identification system that depends on the sequence of feature vectors. The proposed system used a hybrid Convolutional Recurrent Neural Network (CRNN) that combines a Convolutional Neural Network (CNN) with a Recurrent Neural Network (RNN) network, for spoken language identification on seven languages, including Arabic, chosen from subsets of the Mozilla Common Voice (MCV) corpus.…”

Section: Introductionmentioning

confidence: 99%

Improved Support Vector-Recurrent Neural Network with Optimal Feature Selection-based Spoken Language Identification System

Thukroo¹,

Bashir²,

Giri³

2023

IJST

View full text Add to dashboard Cite

Objective: Spoken language identification being the fore-front of language recognition tasks and most significant medium of communication has to be enhanced in order to improve the accuracy of recently developed spoken language recognition systems. The purpose of this paper is to enhance the Spoken Language Identification (SLID) model using hybrid machine learning with deep learning model for regionally spoken languages of Jammu & Kashmir (JK) and Ladakh. Method: Initially, the speech signals of different languages of JK and Ladakh are manually collected from diverse sources, and it is preprocessed using Spectral Noise Gate (SNG) filtering technique. Once the speech signals are pre-processed, the feature extraction is performed by the cepstral features like Mel-frequency Cepstral Coefficients (MFCCs), Relative Spectral Transform-Perceptual Linear Prediction (RASTA-PLP), and spectral features like spectral roll off, spectral flatness. Findings: From this feature extraction, the length of the feature vector seems to be long, and it is required to reduce the feature length. Hence, optimal feature selection is accomplished using the new meta-heuristic algorithm termed Adaptive Distance-based Tunicate Swarm Algorithm (AD-TSA) by considering the minimum correlation as objective. Finally, the language identification is handled by the hybrid classifier termed Improved Support Vector Machine-Recurrent Neural Network (ISVM-RNN). Novelty: The identification learning algorithm is enhanced by the AD-TSA by considering the minimum correlation as objective among features in order to get minimum number of features that are sufficient for language identification process. The efficiency of the proposed hybrid approach is validated by simulating the experiment on a user-defined language database of JK and Ladakh speech signals in the working platform of Python.

show abstract

Spoken Language Identification System Using Convolutional Recurrent Neural Network

Cited by 18 publications

References 38 publications

Speaker identification using hybrid subspace, deep learning and machine learning classifiers

Speaker identification using hybrid subspace, deep learning and machine learning classifiers

A Deep Learning Approach for Identifying and Discriminating Spoken Arabic Among Other Languages

Improved Support Vector-Recurrent Neural Network with Optimal Feature Selection-based Spoken Language Identification System

Contact Info

Product

Resources

About