2009
DOI: 10.1016/j.specom.2009.02.006
|View full text |Cite
|
Sign up to set email alerts
|

Signal adaptive spectral envelope estimation for robust speech recognition

Abstract: This paper describes a novel spectral envelope estimation technique which adapts to the characteristics of the observed signal. This is possible via the introduction of a second bilinear transformation into warped minimum variance distortionless response (MVDR) spectral envelope estimation. As opposed to the first bilinear transformation, however, which is applied in the time domain, the second bilinear transformation must be applied in the frequency domain. This extension enables the resolution of the spectra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2011
2011
2015
2015

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(1 citation statement)
references
References 29 publications
0
1
0
Order By: Relevance
“…We measured seven parameters representing the aeroacoustic characteristics of sibilant /s/, which consist of autocorrelation coefficient, zero crossing count, and five spectral properties: F peak , F′ peak , S p , S′ p , and A d . The autocorrelation coefficient, which is the correlation between adjacent speech signals, is normally close to zero for unvoiced speech, such as sibilant /s/, although for the voiced speech, it is close to 1 because the speech waveform signals are correlated strongly (18). The zero crossing count indicates the frequency at which the energy is concentrated in the spectrum.…”
Section: Methodsmentioning
confidence: 99%
“…We measured seven parameters representing the aeroacoustic characteristics of sibilant /s/, which consist of autocorrelation coefficient, zero crossing count, and five spectral properties: F peak , F′ peak , S p , S′ p , and A d . The autocorrelation coefficient, which is the correlation between adjacent speech signals, is normally close to zero for unvoiced speech, such as sibilant /s/, although for the voiced speech, it is close to 1 because the speech waveform signals are correlated strongly (18). The zero crossing count indicates the frequency at which the energy is concentrated in the spectrum.…”
Section: Methodsmentioning
confidence: 99%