Robust Feature Extraction Using Autocorrelation Domain for Noisy Speech Recognition

Farahani, Gholamreza

doi:10.5121/sipij.2017.8103

Cited by 4 publications

(3 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…ASR systems have principally focused on phoneme, word, hence sentence decoding and identification of speaker using various algorithms and techniques like LPC [8] [13] with Hidden Markov Model [14][16][18] [20] where they predict the output based on expectation maximization by reducing error. The ASR systems and speaker identification application includes auto correlation analysis and LPC analysis [8], [13], [12], it is revealed that features extracted using MFCC perform better compared to LPC [9] in speech recognition. With regard to acoustic feature extraction, researchers used Mel Frequency Cepstral Coefficients, since cepstral coefficients mimics human perception [2], [8], [4], [5], [10], [13].…”

Section: Objectivesmentioning

confidence: 99%

Experimental Analysis on Performance of Speech Utterance recognition using AI Models

Srikanth G N, M K Venkatesha

2023

tjjpt

View full text Add to dashboard Cite

In Automated speech recognition of the system performance is crucial and important to satisfy multiple requirements of HMI and, more recently, even in IoT-related applications as well. Concurrently, there has been an increase in demand for detecting strong critical features derived from speech utterances. This paper presents a performance of the developed machine learning algorithms with respect to audio digit speech recognition and classification. The prepared dataset contains a free range of words (from 1 to 10) from speakers of different age groups. The Audacity software used for preprocessing the audio files that includes removal of noise included in the signal and trimming the silence on either side of the word utterance. audio signal sampled at fs = 48kHz.We have developed four AI Models to recognise the word utterances. Audio signals are processed separately and derived two unique feature sets that includes statistical features set and singular values by performing SVD related to word utterances. The cepstral values for each utterance are obtained from state-of-the-art MFCC. Variance-covariance matrix is calculated from the generated MFCC matrix. The diagonal values which form the variance are recorded and denoted as feature set-1 for the word utterance and inputted to the machine learning algorithms. Performance matrices of the developed models are recorded. To keep the computational bottleneck associated with the use of feature sets to minimum, dimensionality reduction is carried out by applying singular value decomposition to the extracted MFCC matrix. The derived set of singular values considered as feature set-2 is used to train and test the developed AI models with a ratio of 70:30. We presented and discussed the performance and results produced by MLP, KNN, SVM, Random Forest algorithms. In comparison, MLP and Random Forest were found to show excellent performance on both feature sets with 100% training accuracy and 99% test accuracy.

show abstract

Section: Objectivesmentioning

confidence: 99%

Experimental Analysis on Performance of Speech Utterance recognition using AI Models

Srikanth G N, M K Venkatesha

2023

tjjpt

View full text Add to dashboard Cite

show abstract

“…However, the volume of the voice data files after the aggravation processing is often very large. Therefore, in order to reduce the burden of computer processing and improve the data processing capability, it is necessary to segment the speech signal [12]. Segmentation processing is a common method for computers to process general data, which is manifested in speech signals by dividing the signal into frames.…”

Section: Pre-emphasis and Framingmentioning

confidence: 99%

Voice Timbre Evaluation of Broadcast Host Based on Extraction of Voice Feature Parameters

2022

IJAID

View full text Add to dashboard Cite

The continuous development of science and technology has continuously improved people's quality of life. People's definition of health is also clearer, and more attention is paid to laryngeal diseases and the quality of voice. In real life, accurate evaluation of voice quality and timbre is not only beneficial to the diagnosis and treatment of laryngeal diseases. At the same time, it also plays a vital role in the selection and training of broadcasting and hosting talents. The current evaluation methods for related professional voices have shortcomings such as time-consuming, labor-intensive and highly subjective. In order to overcome these defects and explore how to help the training of broadcast hosts, this paper conducts in-depth research on the processing method of speech signals and the objective evaluation method of artistic voice timbre from the perspective of signal processing. Through experimental analysis, it is found that the objective evaluation method of voice timbre based on acoustic parameters F0, F1, F3 can better realize the objective evaluation of the voice timbre of the broadcast host. Among them, the accuracy rate of the evaluation method based on multiple feature parameter extraction reaches 89.2%.

show abstract

“…Disken et al [5] proposed an algorithm showed superior verification performance both with the conventional GMM-universal background model and universal background model (UBM) method, and the state of-the-art i-vector method. Farahani [6] discussed the robust features extractions using autocorrelation domain for noisy speech recognition. This paper depicted a straightforward and compelling strategy for diminishing the impact of clamor on the autocorrelation of the perfect flag.…”

Section: Literature Reviewmentioning

confidence: 99%

Speech Recognition using Cross Correlation Algorithm Intended for Noise Reduction

Kaur¹,

Baghla²

2018

AJCST

View full text Add to dashboard Cite

Biometrics is presently a buzzword in the domain of information security as it provides high degree of accuracy in identifying an individual. Speech recognition is the ability of a machine or program to identify words and phrases in spoken language and convert them to a machine-readable format. Rudimentary speech recognition software has a limited vocabulary of words and phrases, and it may only identify these if they are spoken very clearly. The research work is intended to build a GUI environment which would provide provisions to record the speech and would assist in multiplying the database. The research work is primarily focused to implement a system capable of recognizing a user’s speech and creating audio files that can be added up to create a dynamic template or database. The research work emphasizes on directly recording the spoken words avoiding the problems with use of microphone. On appropriate recording and removal of the noise, the best matched audio file from the template is recognized when an input is provided externally on the basis of graphs created by considering correlation.

show abstract

Robust Feature Extraction Using Autocorrelation Domain for Noisy Speech Recognition

Abstract: ABSTRACT

Cited by 4 publications

References 19 publications

Experimental Analysis on Performance of Speech Utterance recognition using AI Models

Experimental Analysis on Performance of Speech Utterance recognition using AI Models

Voice Timbre Evaluation of Broadcast Host Based on Extraction of Voice Feature Parameters

Speech Recognition using Cross Correlation Algorithm Intended for Noise Reduction

Contact Info

Product

Resources

About