2019
DOI: 10.1109/access.2019.2912648
|View full text |Cite
|
Sign up to set email alerts
|

Mispronunciation Detection Using Deep Convolutional Neural Network Features and Transfer Learning-Based Model for Arabic Phonemes

Abstract: Computer-assisted language learning (CALL) systems provide an automated framework to identify mispronunciation and give useful feedback. Traditionally, handcrafted acoustic-phonetic features are used to detect mispronunciation. From this line of research, this paper investigates the use of the deep convolutional neural network for mispronunciation detection of Arabic phonemes. We propose two methods with different techniques, i.e., convolutional neural network features (CNN_Features)-based technique and a tran… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
2

Relationship

1
8

Authors

Journals

citations
Cited by 40 publications
(19 citation statements)
references
References 29 publications
(41 reference statements)
0
15
0
Order By: Relevance
“…In our work, we used deep learning and machine learning methods for the classification process. In order to input data into our classifiers, deep features were extracted [37], [38] from our speech signal dataset. We used deep learning convolution model Alexnet for extracting deep features from the pc-Gita dataset.…”
Section: B Feature Extraction: Deep Features Using Alexnet Modelmentioning
confidence: 99%
“…In our work, we used deep learning and machine learning methods for the classification process. In order to input data into our classifiers, deep features were extracted [37], [38] from our speech signal dataset. We used deep learning convolution model Alexnet for extracting deep features from the pc-Gita dataset.…”
Section: B Feature Extraction: Deep Features Using Alexnet Modelmentioning
confidence: 99%
“…For phone-level recognition systems, the word error rate is called a PER [76]. In our experiments, we used the HResults analysis tool from the HTK toolkit [77] to calculate PER, which is computed by (3).…”
Section: ) Phone Error Rate (Per)mentioning
confidence: 99%
“…Phoneme recognition plays a dominant part in many applications such as speech recognition [1], speaker recognition [2], and pronunciation error detection and correction [3]. With the success of deep learning techniques for computer vision, many studies have been conducted on speech processing tasks by converting speech signals to a visual representation such as spectrogram [4].…”
Section: Introductionmentioning
confidence: 99%
“…Moreover, this approach is also common for solving regression problems. The performance of the DL model improves when the frequency of the data increases [55]. In this study, we use a DL model based on the Convolutional Neural Network with the standard-setting.…”
Section: Support Vector Regressionmentioning
confidence: 99%