In recent years, the accuracy of speech recognition (SR) has been one of the most active areas of research. Despite that SR systems are working reasonably well in quiet conditions, they still suffer severe performance degradation in noisy conditions or distorted channels. It is necessary to search for more robust feature extraction methods to gain better performance in adverse conditions. This paper investigates the performance of conventional and new hybrid speech feature extraction algorithms of Mel Frequency Cepstrum Coefficient (MFCC), Linear Prediction Coding Coefficient (LPCC), perceptual linear production (PLP), and RASTA-PLP in noisy conditions through using multivariate Hidden Markov Model (HMM) classifier. The behavior of the proposal system is evaluated using TIDIGIT human voice dataset corpora, recorded from 208 different adult speakers in both training and testing process. The theoretical basis for speech processing and classifier procedures were presented, and the recognition results were obtained based on word recognition rate.
Speech recognition allows the machine to turn the speech signal into text through identification and understanding process. Extract the features, predict the maximum likelihood, and generate the models of the input speech signal are considered the most important steps to configure the Automatic Speech Recognition System (ASR). In this paper, an automatic Arabic speech recognition system was established using MATLAB and 24 Arabic words Consonant-Vowel Consonant-Vowel Consonant-Vowel (CVCVCV) was recorded from 19 Arabic native speakers, each speaker uttering the same word 3 times (total 1368 words). In order to test the system, 39-features were extracted by partitioning the speech signal into frames ~ 0.25 sec shifted by 0.10 sec. in back-end, the statistical models were generated by separated the features into number of states between 4 to 10, each state has 8-gaussian distributions. The data has 48 k sample rate and 32-bit depth and saved separately in a wave file format. The system was trained in phonetically rich and balanced Arabic speech words list (10 speakers * 3 times * 24 words, total 720 words) and tested using another word list (24 words * 9 speakers * 3 times *, total 648 words). Using different speakers similar words, the system obtained a very good word recognition accuracy results of 92.92% and a Word Error Rate (WER) of 7.08%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.