2012
DOI: 10.1109/tasl.2011.2166544
|View full text |Cite
|
Sign up to set email alerts
|

Modulation Spectrum Equalization for Improved Robust Speech Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2013
2013
2018
2018

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 19 publications
(5 citation statements)
references
References 56 publications
0
5
0
Order By: Relevance
“…The 13-dim static MFCCs plus their first and second order derivatives are then the components of the 39-dimensional feature vector used as the [11]. According to (5), the key component in the presented MSPLE algorithm is the power-law factor . In order to obtain a good selection for this parameter, we use the 8440 utterances (with five SNRs: clean, 20 dB, 15 dB, 10 dB and 5 dB) in the training set for the mode of multi-condition training of the Aurora-2 database as the development set.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The 13-dim static MFCCs plus their first and second order derivatives are then the components of the 39-dimensional feature vector used as the [11]. According to (5), the key component in the presented MSPLE algorithm is the power-law factor . In order to obtain a good selection for this parameter, we use the 8440 utterances (with five SNRs: clean, 20 dB, 15 dB, 10 dB and 5 dB) in the training set for the mode of multi-condition training of the Aurora-2 database as the development set.…”
Section: Methodsmentioning
confidence: 99%
“…Initially, the statistics normalization is operated on the temporal domain of speech features, and the respective methods include cepstral mean normalization (CMN) [1], mean and variance normalization (MVN) [2], cepstral histogram normalization (CHN) [3] and MVN plus ARMA filtering (MVA) [4]. Later, the concept of statistics normalization is further used to process the modulation spectral domain of speech features, and the methods of spectral histogram equalization (SHE) [5], magnitude ratio equalization (MRE) [5] and sub-band statistics normalization techniques [6] are accordingly developed. By and large, the paring of temporal-and modulation spectral-domain methods can give superior performance relative to the component single domain method.…”
Section: Introductionmentioning
confidence: 99%
“…To enhance the recognition performance robustness, a lot of approaches have been proposed [2] [3] [4]. A successful category of approaches aims to produce robust features that are less sensitive to environmental mismatch between training and testing conditions.…”
Section: Introductionmentioning
confidence: 99%
“…The idea of moment normalization in CMS, CMVN and CHN stated above has been adopted to compensate the modulation spectrum [4][5][6][7], which is the discrete Fourier transform of the cepstral time series of an utterance. For instance, in the method of spectral histogram equalization (SHE) [4], all the (magnitude) spectral points in the entire modulation frequency band are treated in a joint manner to estimate a histogram, which is then used to produce a transform for the modulation spectrum and to make the updated modulation spectrum close to a target histogram.…”
Section: Introductionmentioning
confidence: 99%
“…For instance, in the method of spectral histogram equalization (SHE) [4], all the (magnitude) spectral points in the entire modulation frequency band are treated in a joint manner to estimate a histogram, which is then used to produce a transform for the modulation spectrum and to make the updated modulation spectrum close to a target histogram. SHE can be viewed as the spectral-domain counterpart of CHN, and it effectively reduces the noise effect in the modulation spectrum of cepstral time series.…”
Section: Introductionmentioning
confidence: 99%