Arabic automatic speech recognition (ASR) methods with diacritics have the ability to be integrated with other systems better than Arabic ASR methods without diacritics. In this work, the application of state-of-the-art end-to-end deep learning approaches is investigated to build a robust diacritised Arabic ASR. These approaches are based on the Mel-Frequency Cepstral Coefficients and the log Mel-Scale Filter Bank energies as acoustic features. To the best of our knowledge, end-to-end deep learning approach has not been used in the task of diacritised Arabic automatic speech recognition. To fill this gap, this work presents a new CTC-based ASR, CNN-LSTM, and an attention-based end-to-end approach for improving diacritisedArabic ASR. In addition, a word-based language model is employed to achieve better results. The end-to-end approaches applied in this work are based on state-of-the-art frameworks, namely ESPnet and Espresso. Training and testing of these frameworks are performed based on the Standard Arabic Single Speaker Corpus (SASSC), which contains 7 h of modern standard Arabic speech. Experimental results show that the CNN-LSTM with an attention framework outperforms conventional ASR and the Joint CTC-attention ASR framework in the task of Arabic speech recognition. The CNN-LSTM with an attention framework could achieve a word error rate better than conventional ASR and the Joint CTC-attention ASR by 5.24% and 2.62%, respectively.This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
Arabic language has a set of sound letters called diacritics, these diacritics play an essential role in the meaning of words and their articulations. The change in some diacritics leads to a change in the context of the sentence. However, the existence of these letters in the corpus transcription affects the accuracy of speech recognition. In this paper, we investigate the effect of diactrics on the Arabic speech recognition based end-to-end deep learning. The applied end-to-end approach includes CNN-LSTM and attention-based technique presented in the state-of-the-art framework namely, Espresso using Pytorch. In addition, and to the best of our knowledge, the approach of CNN-LSTM with attention-based has not been used in the task of Arabic Automatic speech recognition (ASR). To fill this gap, this paper proposes a new approach based on CNN-LSTM with attention based method for Arabic ASR. The language model in this approach is trained using RNN-LM and LSTM-LM and based on nondiacritized transcription of the speech corpus. The Standard Arabic Single Speaker Corpus (SASSC), after omitting the diacritics, is used to train and test the deep learning model. Experimental results show that the removal of diacritics decreased out-of-vocabulary and perplexity of the language model. In addition, the word error rate (WER) is significantly improved when compared to diacritized data. The achieved average reduction in WER is 13.52%.
Abstract-Named Entity Recognition (NER) is currently an essential research area that supports many tasks in NLP. Its goal is to find a solution to boost accurately the named entities identification. This paper presents an integrated semantic-based Machine learning (ML) model for Arabic Named Entity Recognition (ANER) problem. The basic idea of that model is to combine several linguistic features and to utilize syntactic dependencies to infer semantic relations between named entities. The proposed model focused on recognizing three types of named entities: person, organization and location. Accordingly, it combines internal features that represented linguistic features as well as external features that represent the semantic of relations between the three named entities to enhance the accuracy of recognizing them using external knowledge source such as Arabic WordNet ontology (ANW). We introduced both features to CRF classifier, which are effective for ANER. Experimental results show that this approach can achieve an overall F-measure around 87.86% and 84.72% for ANERCorp and ALTEC datasets respectively.
The term” crime prevention” refers to a group of initiatives that work with people, communities, businesses, non-governmental organizations, and all levels of government to address the numerous social and environmental risk factors for crime, disorder, and victimization in communities. In this paper, the authors proposed various regression model for the prediction of communities and crime including decision tree regressor, MLP regressor, SVR, random forest regressor, and K-Neighbors regressor. The communities and crime dataset are used for training and evaluation the proposed model. The results show that there is a decrease in RMSE, MAE, MBE, R, R2, RRMSE, NSE, and WI when compared to the traditional methods.
Air pollution is a particularly important problem in most countries right now because of its terrible effects on both the environment and human health. Big cities are most impacted because of the country’s quick industrial and economic development. In this paper, the authors proposed various regression model for the prediction of air quality including decision tree regressor, MLP regressor, SVR, random forest regressor, and K-Neighbors regressor. The air quality dataset, in Itally cities, is used for training and evaluation the proposed model. The results show that there is a decrease in RMSE, MAE, MBE, R, R2, RRMSE, NSE, and WI when compared to the traditional methods.
The attention-based encoder-decoder technique, known as the trans-former, is used to enhance the performance of end-to-end automatic speech recognition (ASR). This research focuses on applying ASR end-toend transformer-based models for the Arabic language, as the researchers' community pays little attention to it. The Muslims Holy Qur'an book is written using Arabic diacritized text. In this paper, an end-to-end transformer model to building a robust Qur'an vs. recognition is proposed. The acoustic model was built using the transformer-based model as deep learning by the PyTorch framework. A multi-head attention mechanism is utilized to represent the encoder and decoder in the acoustic model. A Mel filter bank is used for feature extraction. To build a language model (LM), the Recurrent Neural Network (RNN) and Long short-term memory (LSTM) were used to train an n-gram word-based LM. As a part of this research, a new dataset of Qur'an verses and their associated transcripts were collected and processed for training and evaluating the proposed model, consisting of 10 h of .wav recitations performed by 60 reciters. The experimental results showed that the proposed end-to-end transformer-based model achieved a significant low character error rate (CER) of 1.98% and a word error rate (WER) of 6.16%. We have achieved state-of-the-art end-to-end transformer-based recognition for Qur'an reciters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.