A perceptual masking approach for noise robust speech recognition

Maganti, Hari Krishna; Matassoni, Marco

doi:10.1186/1687-4722-2012-29

Cited by 10 publications

(13 citation statements)

References 10 publications

(16 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Type 4: II requires three HEQ operations, the same as S-HEQ, demonstrating that S-HEQ and WS-HEQ (1) II are similar in computational complexity.…”

Section: Proposed Approach: Ws-heqmentioning

confidence: 96%

“…Equations 12 to 15, by assigning α as less than 1.0, WS-HEQ (1) , which requires three HEQ operations, displays the best behavior, regardless of the selected structure. However, the two types that require only two HEQ operations (i.e., WS-HEQ (2) and …”

Section: Among the Four Types Of Ws-heq Listed Inmentioning

confidence: 99%

“…A significant number of noise-robustness techniques have been proposed to address the noise problem, and one prevailing subset of these techniques is focused on reducing the statistical mismatch of speech features in the training and testing conditions of the recognizer. Typical examples are perceptual masking [1], empirical mode decomposition [2], optimally modified log-spectral amplitude estimation [3], wavelet packet decomposition with AR modeling [4], cepstral mean and variance normalization (MVN) [5], cepstral histogram normalization (CHN) [6,7], MVN with ARMA filtering (MVA) [8], higher order cepstral moment normalization (HOCMN) [9], and temporal structure normalization (TSN) [10]. In some of these methods, the compensation is performed on each individual cepstral channel sequence of an utterance by assuming that these channels are mostly uncorrelated [7].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Intra-frame cepstral sub-band weighting and histogram equalization for noise-robust speech recognition

Hung

Fan

2013

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

In this paper, we propose a novel noise-robustness method known as weighted sub-band histogram equalization (WS-HEQ) to improve speech recognition accuracy in noise-corrupted environments. Considering the observations that high-and low-pass portions of the intra-frame cepstral features possess unequal importance for noise-corrupted speech recognition, WS-HEQ is intended to reduce the high-pass components of the cepstral features. Furthermore, we provide four types of WS-HEQ, which partially refers to the structure of spatial histogram equalization (S-HEQ). In the experiments conducted on the Aurora-2 noisy-digit database, the presented WS-HEQ yields significant recognition improvements relative to the Mel-scaled filter-bank cepstral coefficient (MFCC) baseline and to cepstral histogram normalization (CHN) in various noise-corrupted situations and exhibits a behavior superior to that of S-HEQ in most cases.

show abstract

“…Type 4: II requires three HEQ operations, the same as S-HEQ, demonstrating that S-HEQ and WS-HEQ (1) II are similar in computational complexity.…”

Section: Proposed Approach: Ws-heqmentioning

confidence: 96%

Section: Among the Four Types Of Ws-heq Listed Inmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Intra-frame cepstral sub-band weighting and histogram equalization for noise-robust speech recognition

Hung

Fan

2013

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

show abstract

“…To reduce the effect of any present tones which caused by increased variance at random frequencies, M anti and Matassoni [19] performed variance normalizat across the critical bands for spectral subtraction speech hancement algorithm. The variance is computed as in (…”

Section: Varinace Normalizationmentioning

confidence: 99%

“…Lu [18], employed an optimal smoothing factor, adapted by the variation of signal to spectral deviation ratio in successive frames. Variance normalization was used in spectral subtraction speech enhancement algorithm by Maganti and Matassoni [19] across the critical bands to smoothen the output signal, removing the spikes in the output which reduced the effect of increased variance at random frequencies.…”

Section: Introductionmentioning

confidence: 99%