Enhanced Automatic Speech Recognition System Based on Enhancing Power-Normalized Cepstral Coefficients

Tamazin, Mohamed; Gouda, Ahmed M.; Khedr, Mohamed

doi:10.3390/app9102166

Cited by 22 publications

(13 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The TIDIGITS (Leonard and Doddington, 1993) (LDC Catalog No. LDC93S10) is a speech corpus of spoken digits for speaker-independent speech recognition (Cooke et al, 2001;Tamazin et al, 2019). The speakers are from different genders (male and female), age ranges (adults and children), dialect districts (Boston, Richmond, Lubbock, etc.).…”

Section: Spike-tidigits and Spike-timit Databasesmentioning

confidence: 99%

An Efficient and Perceptually Motivated Auditory Neural Encoding and Decoding Algorithm for Spiking Neural Networks

Pan

Chua

et al. 2020

Front. Neurosci.

View full text Add to dashboard Cite

The auditory front-end is an integral part of a spiking neural network (SNN) when performing auditory cognitive tasks. It encodes the temporal dynamic stimulus, such as speech and audio, into an efficient, effective and reconstructable spike pattern to facilitate the subsequent processing. However, most of the auditory front-ends in current studies have not made use of recent findings in psychoacoustics and physiology concerning human listening. In this paper, we propose a neural encoding and decoding scheme that is optimized for audio processing. The neural encoding scheme, that we call Biologically plausible Auditory Encoding (BAE), emulates the functions of the perceptual components of the human auditory system, that include the cochlear filter bank, the inner hair cells, auditory masking effects from psychoacoustic models, and the spike neural encoding by the auditory nerve. We evaluate the perceptual quality of the BAE scheme using PESQ; the performance of the BAE based on sound classification and speech recognition experiments. Finally, we also built and published two spikeversion of speech datasets: the Spike-TIDIGITS and the Spike-TIMIT, for researchers to use and benchmarking of future SNN research.

show abstract

Section: Spike-tidigits and Spike-timit Databasesmentioning

confidence: 99%

An Efficient and Perceptually Motivated Auditory Neural Encoding and Decoding Algorithm for Spiking Neural Networks

Pan

Chua

et al. 2020

Front. Neurosci.

View full text Add to dashboard Cite

show abstract

“…Typically, the MFCC and the perceptual linear predictive (PLP) [10] techniques are evaluated as the most widely used techniques in speech and speaker recognition systems. However, the PLP method relative spectral (RASTA) [10] filtering is combined with the feature extraction technique to remove channel noises compared to the speech signal. Recently, the enhanced automatic speech recognition system based on enhancing PNCC has been presented [10].…”

Section: Introductionmentioning

confidence: 99%

“…However, the PLP method relative spectral (RASTA) [10] filtering is combined with the feature extraction technique to remove channel noises compared to the speech signal. Recently, the enhanced automatic speech recognition system based on enhancing PNCC has been presented [10]. PNCC also proposes are estimated over a long duration that is commonly used for speech, as well as frequency smoothing.…”

Section: Introductionmentioning

confidence: 99%

Enhanced Feature Extraction Based on Absolute Sort Delta Mean Algorithm and MFCC for Noise Robustness Speech Recognition

Nosan¹,

Sitjongsataporn²

2021

IJIES

View full text Add to dashboard Cite

In this paper, a proposed absolute sort delta mean (ASDM) method obtaining the speech feature extraction for noise robustness is developed from mel-frequency cepstral coefficients (MFCC) named ASDM-MFCC, in order to increase robustness against the different types of environmental noises. This method is used to suppress the noise effects by finding a rearranging average of power spectrum magnitude combined with triangular bandpass filtering. Firstly, the spectral power magnitudes are sorted in each frequency band of the speech signal. Secondly, the absolutedelta values are arranged and then a mean value is determined in the last step. The purpose of proposed ASDM-MFCC algorithm is to require the noise robustness of the feature vector extracted from the speech signal with the characteristic coefficients. The NOIZEUS noisy speech corpus dataset is used to evaluate the performance of proposed ASDM-MFCC algorithm by Euclidean distance method with the low computation complexity. Experimental results show that the proposed method can provide significantly the improvement in terms of accuracy at low signal to noise ratio (SNR). In the case of car and station at SNR=5dB, the proposed approach can outperform in comparison with the conventional MFCC and gammatone frequency cepstral coefficient (GFCC) by 80% and 76.67%, respectively. Obviously, some experimental results of the proposed ASDM-MFCC algorithm are more robust than the traditional one.

show abstract

“…However, this technique was decreased the calculation speech and massively required more computational resources. A modified approach of power normalized cepstral coefficient system by utilizing the large time power and minimizing the channel bias was presented [8]. They intended to increase the noise robustness of the system.…”

Section: Introductionmentioning

confidence: 99%

Discrete Wavelet Denoising into MFCC for Noise Suppressive in Automatic Speech Recognition System

Naing¹,

Hidayat²,

Hartanto³

et al. 2020

IJIES

View full text Add to dashboard Cite

Automatic Speech Recognition (ASR) is a challenging task and the most problematic issues being in presence of background noise and substantial variability in speech. Extracting the noise-robust features adjust for speech degradations due to noise effect retained popular issue in recent years. This paper presented a framework for wavelet denoising scheme and analysed the different wavelet families and proper thresholding rule into feature extraction to enhance the performance of ASR system. Gaussian Mixture Model-based Hidden Markov Model (GMM-HMM) and Deep Neural Network (DNN)-HMM are used as the speech recognizer. The recognition performance shows that the noise-robust features are obtained while combining with the wavelet transform denoising into Mel Frequency Cepstral Coefficient (MFCC) on Aurora2 database. The best accuracy is gained by cross entropy DNN-HMM training using denoising with Coiflet wavelet and Rigrsure threshold, which provides 97.54% in 10dB, 93.13% in 5dB, 75.63% in 0dB and 37.29% in −5dB.

show abstract

Enhanced Automatic Speech Recognition System Based on Enhancing Power-Normalized Cepstral Coefficients

Cited by 22 publications

References 25 publications

An Efficient and Perceptually Motivated Auditory Neural Encoding and Decoding Algorithm for Spiking Neural Networks

An Efficient and Perceptually Motivated Auditory Neural Encoding and Decoding Algorithm for Spiking Neural Networks

Enhanced Feature Extraction Based on Absolute Sort Delta Mean Algorithm and MFCC for Noise Robustness Speech Recognition

Discrete Wavelet Denoising into MFCC for Noise Suppressive in Automatic Speech Recognition System

Contact Info

Product

Resources

About