Histogram based normalization in the acoustic feature space

Molau, Sirko; Pitz, Michael; Ney, Hermann

doi:10.1109/asru.2001.1034579

Cited by 51 publications

(40 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…HQ was verified to be able to handle various noisy conditions including non-stationary noisy environments [5,6]. The Histogram Equalization (HEQ) has been proposed and popularly used to equalize the cumulative distributions (or histograms) of both the training and testing feature parameters, and shown to produce very robust features for recognition [8,9,10]. The HEQ can be viewed as the limiting case of HQ proposed here when the number of the HQ quantization levels becomes infinite.…”

Section: Basic Formulation Of Hqmentioning

confidence: 98%

Three-Stage Error Concealment for Distributed Speech Recognition (DSR) with Histogram-Based Quantization (HQ) Under Noisy Environment

Wang

Chen

Lee

2007

2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07

View full text Add to dashboard Cite

In this paper, a three-stage error concealment (EC) framework based on the recently proposed Histogram-based Quantization (HQ) for Distributed Speech Recognition (DSR) is proposed, in which noisy input speech is assumed and both the transmission errors and environmental noise are considered jointly. The first stage detects the erroneous feature parameters at both the frame and subvector levels. The second stage then reconstructs the detected erroneous subvectors by M\AP estimation, considering the prior speech source statistics, the channel transition probability, and the reliability of the received subvectors. The third stage then considers the uncertainty of the estimated vectors during Viterbi decoding. At each stage, the error concealment (EC) techniques properly exploit the inherent robust nature of Histogram-based Quantization (HQ). Extensive experiments with AURORA 2.0 testing environment and GPRS simulation indicated the proposed framework is able to offer significantly improved performance against a wide variety of environmental noise and transmission error conditions.

show abstract

Section: Basic Formulation Of Hqmentioning

confidence: 98%

Three-Stage Error Concealment for Distributed Speech Recognition (DSR) with Histogram-Based Quantization (HQ) Under Noisy Environment

Wang

Chen

Lee

2007

2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07

View full text Add to dashboard Cite

show abstract

“…The rest of the ASR system can be seen from [9]. There are different possible positions for pHEQ in the front-end [4]. In this paper, pHEQ is applied before the Mel-filter bank, since this configuration consistently outperformed the alternatives in our preliminary experiments.…”

Section: Parametric Histogram Equalizationmentioning

confidence: 99%

“…Both during training and testing the observed data is transformed as to match the target CDF as good as possible. The observation and target probability density functions (PDFs) pX (P log X ) and pY (P log Y ) can be approximated reasonably well by a bimodal Gaussian process [4]. The bimodal Gaussian statistics form a simple Gaussian Mixture Models (GMM) for which the parameters can be efficiently estimated using Expectation Maximization (EM).…”

Section: Parametric Histogram Equalizationmentioning

confidence: 99%

“…Histogram equalization (HEQ) on the other hand can cope with non-linear transformations [1,4]. The principal idea of histogram equalization is to transform the distribution of the observed acoustic feature vectors as to match a target distribution [4].…”

Section: Introductionmentioning

confidence: 99%

“…The principal idea of histogram equalization is to transform the distribution of the observed acoustic feature vectors as to match a target distribution [4]. Another nonlinear technique is noise masking [5,6].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Histogram equalization and noise masking for robust speech recognition

Zhang

Demuynck

hamme

2010

2010 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

Mismatch between training and test conditions deteriorates the performance of speech recognizers. This paper investigates the combination of parametric histogram equalization (pHEQ) and noise masking to compensate for the mismatch caused by additive noise. The proposed front-end maps the distribution of the observed power spectrum vectors to a target distribution. The target distribution matches the distribution of the noise free training data except for an artificially reduced signal-to-noise ratio. Different power spectrum estimation algorithms are used to estimate the noise distribution as used internally by pHEQ more reliably under nonstationary noise conditions. The proposed front-end is evaluated on the Aurora4 database and shows a significant improvement w.r.t. mean-normalized Mel-frequency spectral coefficients. Moreover, the performance could be further improved if better estimates of the instantaneous noise power spectrum were available.

show abstract