2021
DOI: 10.1049/sil2.12057
|View full text |Cite
|
Sign up to set email alerts
|

Arabic speech recognition using end‐to‐end deep learning

Abstract: Arabic automatic speech recognition (ASR) methods with diacritics have the ability to be integrated with other systems better than Arabic ASR methods without diacritics. In this work, the application of state-of-the-art end-to-end deep learning approaches is investigated to build a robust diacritised Arabic ASR. These approaches are based on the Mel-Frequency Cepstral Coefficients and the log Mel-Scale Filter Bank energies as acoustic features. To the best of our knowledge, end-to-end deep learning approach ha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 25 publications
(15 citation statements)
references
References 51 publications
0
15
0
Order By: Relevance
“…Furthermore, CNN has new properties above DNN, such as localization, weight sharing, and pooling. In the convolution unit, the locality is employed to handle noise where is used [31,32]. Additionally, locality minimizes the network weights that must be learned.…”
Section: Theoretical Backgroundmentioning
confidence: 99%
See 3 more Smart Citations
“…Furthermore, CNN has new properties above DNN, such as localization, weight sharing, and pooling. In the convolution unit, the locality is employed to handle noise where is used [31,32]. Additionally, locality minimizes the network weights that must be learned.…”
Section: Theoretical Backgroundmentioning
confidence: 99%
“…Other models, such as GMMs and DNNs, find it harder to manipulate this shifting process. As a result, ASR researchers have recently employed localization in both frequency and time axes in speech signals [31,32].…”
Section: Theoretical Backgroundmentioning
confidence: 99%
See 2 more Smart Citations
“…In the development of the automatic speech recognition system, as before, attention is paid to end-to-end methods; many studies have proven that performance and accuracy increase with an increase in the amount of data used for training. For example, in published studies, the best results in training big data were obtained by end-to-end systems based on CTC [5,6] and attention-based encoder-decoder models. In end-to-end models, all parameters are calculated by the gradient descent method, which is easily influenced by the structure of the neural network.…”
Section: Introductionmentioning
confidence: 99%