Time and frequency filtering of filter-bank energies for robust HMM speech recognition

Nadeu, Climent; Macho, Dušan; Hernando, Javier

doi:10.1016/s0167-6393(00)00048-0

Cited by 122 publications

(122 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Such processing is already under study in auditory neurophysiology (Kowalski et al, 1996a,b;Depireux et al, 2001;Miller et al, 2001Miller et al, , 2002Escabí and Schreiner, 2002) and psychoacoustics (Chi et al, 1999), and is also being investigated for various signal-processing tasks, including audio coding (Atlas and Shamma, 2003;Klein et al, 2003) and speech recognition (Hermmansky, 1999;Nadeu et al, 2001;Kleinschmidt and Gelbart, 2002;Kleinschmidt, 2002).…”

Section: The Linear Processing Of Spectrotemporal Modulation Frequenciesmentioning

confidence: 99%

Stimulus-invariant processing and spectrotemporal reverse correlation in primary auditory cortex

et al. 2006

View full text Add to dashboard Cite

The spectrotemporal receptive field (STRF) provides a versatile and integrated, spectral and temporal, functional characterization of single cells in primary auditory cortex (AI). In this paper, we explore the origin of, and relationship between, different ways of measuring and analyzing an STRF. We demonstrate that STRFs measured using a spectrotemporally diverse array of broadband stimuli -such as dynamic ripples, spectrotemporally white noise, and temporally orthogonal ripple combinations (TORCs) -are very similar, confirming earlier findings that the STRF is a robust linear descriptor of the cell. We also present a new deterministic analysis framework that employs the Fourier series to describe the spectrotemporal modulations contained in the stimuli and responses. Additional insights into the STRF measurements, including the nature and interpretation of measurement errors, is presented using the Fourier transform, coupled to singular-value decomposition (SVD), and variability analyses including bootstrap. The results promote the utility of the STRF as a core functional descriptor of neurons in AI.

show abstract

Section: The Linear Processing Of Spectrotemporal Modulation Frequenciesmentioning

confidence: 99%

Stimulus-invariant processing and spectrotemporal reverse correlation in primary auditory cortex

et al. 2006

View full text Add to dashboard Cite

show abstract

“…It is to be noted that FF-features have previously been shown to yield similar recognition performance as mel-frequency cepstral coefficients (Nadeu et al, 2001). The FFfeatures were obtained with the following parameter set-up: frames of 32 ms length with a 10 ms shift between the frames were used; both preemphasis and Hamming window were applied to each frame; the short-time magnitude spectra, obtained by applying the FFT, was passed to Mel-spaced filter-bank analysis with 20 channels; the obtained logarithm filter-bank energies were then filtered using the filter H(z)=z-z −1 (Nadeu et al, 2001). A feature vector consisting of 18 elements was obtained (the edge values were excluded).…”

Section: Acoustic Modellingmentioning

confidence: 93%

“…The frequency-filtered logarithm filter-bank energies (Nadeu et al, 2001) (referred here as FF-features) were used as speech feature representation due to their suitability for missing-feature based recognition. It is to be noted that FF-features have previously been shown to yield similar recognition performance as mel-frequency cepstral coefficients (Nadeu et al, 2001).…”

Section: Acoustic Modellingmentioning

confidence: 99%

“…Current frame-based speech representations for speech pattern processingwith the mel-frequency cepstral coefficients (MFCCs) (Davis and Mermelstein, 1980) and the frequency-filtered logarithm filter-bank energies (Nadeu et al, 2001) being among the most successful -typically aim at representing the characteristics of the vocal-tract filter. The use of voicing information in speech recognition was suggested in the 1970s (Rabiner and Sambur, 1976).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Incorporating the voicing information into HMM-based automatic speech recognition in noisy environments

Jančovič

Köküer

2009

Speech Communication

View full text Add to dashboard Cite

In this paper, we propose a model for the incorporation of voicing information into a speech recognition system in noisy environments. The employed voicing information is estimated by a novel method that can provide this information for each filterbank channel and does not require information about the fundamental frequency.The voicing information is modelled by employing the Bernoulli distribution. The voicing model is obtained for each HMM state and mixture by a Viterbi-style training procedure. The proposed voicing incorporation is evaluated both within a standard model and two other models that had compensated for the noise effect, the missing-feature and the multi-conditional training model. Experiments are first performed on noisy speech data from the Aurora 2 database. Significant performance improvements are achieved when the voicing information is incorporated within the standard model as well as the noise-compensated models. The employment of voicing information is also demonstrated on a phoneme recognition task on the noise-corrupted TIMIT database and considerable improvements are observed.

show abstract

“…In this paper, we focus on ASR systems that use Frequency Filtered (FF) parameters (Nadeu et al (1995(Nadeu et al ( , 2001); Paliwal (1999)). This parameterization performs as well as the parameterizations in the cepstral domain such as the Mel-frequency cepstral coefficients (MFCC) and has the additional advantage of staying in the log-frequency domain.…”

Section: Introductionmentioning

confidence: 99%

Uncertainty decoding on Frequency Filtered parameters for robust ASR

Vicente-Peña

Díaz-de-María

2010

Speech Communication

View full text Add to dashboard Cite

The use of feature enhancement techniques to obtain estimates of the clean parameters is a common approach for robust automatic speech recognition (ASR). However, the decoding algorithm typically ignores how accurate these estimates are. Uncertainty decoding methods incorporate this type of information. In this paper, we develop a formulation of the uncertainty decoding paradigm for Frequency Filtered (FF) parameters using spectral subtraction as a feature enhancement method. Additionally, we show that the uncertainty decoding method for FF parameters admits a simple interpretation as a spectral weighting method that assigns more importance to the most reliable spectral components.Furthermore, we suggest combining this method with SSBD-HMM (Spectral Subtraction and Bounded Distance HMM), one recently proposed technique that is able to compensate for the effects of features that are highly contaminated (outliers). This combination pursues two objectives: to improve the results achieved by uncertainty decoding methods and to determine which part of the improvements is due to compensating for the effects of outliers and which part is due to compensating for other less deteriorated features.

show abstract

Time and frequency filtering of filter-bank energies for robust HMM speech recognition

Cited by 122 publications

References 36 publications

Stimulus-invariant processing and spectrotemporal reverse correlation in primary auditory cortex

Stimulus-invariant processing and spectrotemporal reverse correlation in primary auditory cortex

Incorporating the voicing information into HMM-based automatic speech recognition in noisy environments

Uncertainty decoding on Frequency Filtered parameters for robust ASR

Contact Info

Product

Resources

About