In digital signal processing, speech processing is one of the areas that is used in many type of applications. It is one of an intensive field of research. The major criterion for good speech processing system is the selection of feature extraction technique, which plays a major role in achieving higher accuracy. In this paper, most commonly used techniques for feature extraction such as Linear Predictive Coefficient (LPC), Mel Frequency Cepstral Coefficient (MFCC), Perceptual Linear Prediction (PLP), Relative Spectral Perceptual Linear Prediction (RASTA-PLP) and Wavelet Transform (WT) are presented. Comparisons that highlight the strengths and the weaknesses of these techniques are also presented. Studies show that feature extraction techniques are mainly selected based on the requirement of the applications. Wavelet transform outperform other techniques for the analysis of non-stationary signals in audio signal. Enhanced Wavelet transform technique is a way forward and studies can be focused on its coefficients. Hybrid methods can be further explored to increase the performance in speech processing. A number of hybrid methods were reviewed, and studies show that Mel-Frequency Cepstral Coefficients (WPCC) provide better results for speech processing applications with standard coefficient for classification.
Al-Quran is the most recited holy book in the Arabic language. Over 1.3-billion Muslim all over the world have an obligation to recite and learn Al-Quran. Learners from non-Arabic as well as from Arabic speaking communities face difficulties with Al-Quran recitation in the absence of a teacher (ustad) around. Advancement in speech recognition technology creates possible solutions to develop a system that has a capability to auricularly discern and validate the recitation. This paper investigates the speech recognition accuracy of template-based acoustic models and propose enhancement methods to improve the accuracy. A new scheme consists of enhancement of Normalized Least Mean Square (NLMS) and Dynamic Time Warping (DTW) algorithms have been proposed. The performance of the speech recognition accuracy was further improved by incorporating an adaptive optimal filtering with modified humming window for MFCC (Mel-frequency cepstral coefficients) using matching technique dynamic programming (DP), DTW (Dynamic Time Wrapping). The proposed scheme increases 5.5% of relative improvement in recognition accuracy achieved over conventional speech recognition process.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.