2021
DOI: 10.21608/ejle.2020.47685.1015
|View full text |Cite
|
Sign up to set email alerts
|

Convolutional Neural Network for Arabic Speech Recognition

Abstract: This work is focused on single word Arabic automatic speech recognition (AASR). Two techniques are used during the feature extraction phase; Log frequency spectral coefficients (MFSC) and Gammatone-frequency cepstral coefficients (GFCC) with their first and second-order derivatives. The convolutional neural network (CNN) is mainly used to execute feature learning and classification process. CNN achieved performance enhancement in automatic speech recognition (ASR). Local connectivity, weight sharing, and pooli… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(2 citation statements)
references
References 34 publications
0
2
0
Order By: Relevance
“…The neurons in the convolution layer use a set of filters (weight matrix V ) to perform convolution calculations with some neurons in the previous layer. This convolution structure exploits the shift-invariance and spatial correlation of target features [34], where shift-invariance refers to the property of the convolution operation that it produces the same result regardless of the position of the input features in the receptive field, and spatial correlation refers to the statistical dependence between nearby features in the image. To reduce redundancy, a set of weight matrices is shared among all receptive fields on the same layer during convolution operations, while different feature maps employ distinct weight matrices.…”
Section: A Complex-valued Convolution Layermentioning
confidence: 99%
“…The neurons in the convolution layer use a set of filters (weight matrix V ) to perform convolution calculations with some neurons in the previous layer. This convolution structure exploits the shift-invariance and spatial correlation of target features [34], where shift-invariance refers to the property of the convolution operation that it produces the same result regardless of the position of the input features in the receptive field, and spatial correlation refers to the statistical dependence between nearby features in the image. To reduce redundancy, a set of weight matrices is shared among all receptive fields on the same layer during convolution operations, while different feature maps employ distinct weight matrices.…”
Section: A Complex-valued Convolution Layermentioning
confidence: 99%
“…CNNs are widely used in Arabic speech recognition for their ability to capture local patterns and features in speech data [115], [116]. The paper [117] focuses on Arabic ASR using MFSC and GFCC with their first and second-order derivatives. The utilization of CNN facilitates feature learning and classification, leading to enhanced performance in Arabic ASR.…”
Section: Convolutional Neural Network (Cnn)mentioning
confidence: 99%