Hugo Van hamme scite author profile

An effective way to increase the noise robustness of automatic speech recognition is to label noisy speech features as either reliable or unreliable (missing), and to replace (impute) the missing ones by clean speech estimates. Conventional im putation techniques employ parametric models and impute the missing features on a frame-by-frame basis. At low SNR's these techniques fail, because too many time frames may contain few, if any, reliable features. In this paper we introduce a novel non-parametric, exemplarbased method for reconstructing clean speech from noisy ob servations, based on techniques from the field of Compressive Sensing. The method, dubbed sparse imputation, can impute missing features using larger time windows such as entire words. Using an overcomplete dictionary of clean speech exemplars, the method finds the sparsest combination of exemplars that jointly approximate the reliable features of a noisy utterance. That linear combination of clean speech exemplars is used to replace the missing features. Recognition experiments on noisy isolated digits show that sparse imputation outperforms conventional imputation tech niques at SNR =-5 dB when using an ideal 'oracle' mask. With error-prone estimated masks sparse imputation performs slightly worse than the best conventional technique.

show abstract

The interpolated fast Fourier transform: a comparative study

Schoukens

Pintelon

hamme

1992

IEEE Trans. Instrum. Meas.

261

View full text Add to dashboard Cite

Parametric identification of transfer functions in the frequency domain-a survey

Pintelon

Guillaume

Rolain

et al. 1994

IEEE Trans. Automat. Contr.

447

View full text Add to dashboard Cite

show abstract

Modeling the relationship between acoustic stimulus and EEG with a dilated convolutional neural network

Accou

Monesi

Montoya

et al. 2021

View full text Add to dashboard Cite

A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition

Hermus

Wambacq

hamme

2006

EURASIP J. Adv. Signal Process.

108

View full text Add to dashboard Cite

The objective of this paper is threefold: (1) to provide an extensive review of signal subspace speech enhancement, (2) to derive an upper bound for the performance of these techniques, and (3) to present a comprehensive study of the potential of subspace filtering to increase the robustness of automatic speech recognisers against stationary additive noise distortions. Subspace filtering methods are based on the orthogonal decomposition of the noisy speech observation space into a signal subspace and a noise subspace. This decomposition is possible under the assumption of a low-rank model for speech, and on the availability of an estimate of the noise correlation matrix. We present an extensive overview of the available estimators, and derive a theoretical estimator to experimentally assess an upper bound to the performance that can be achieved by any subspace-based method. Automatic speech recognition (ASR) experiments with noisy data demonstrate that subspace-based speech enhancement can significantly increase the robustness of these systems in additive coloured noise environments. Optimal performance is obtained only if no explicit rank reduction of the noisy Hankel matrix is performed. Although this strategy might increase the level of the residual noise, it reduces the risk of removing essential signal information for the recogniser's back end. Finally, it is also shown that subspace filtering compares favourably to the well-known spectral subtraction technique.

show abstract

An LSTM Based Architecture to Relate Speech Stimulus to Eeg

Monesi

Accou

Montoya-Martínez

et al. 2020

View full text Add to dashboard Cite

Modeling the relationship between natural speech and a recorded electroencephalogram (EEG) helps us understand how the brain processes speech and has various applications in neuroscience and brain-computer interfaces. In this context, so far mainly linear models have been used. However, the decoding performance of the linear model is limited due to the complex and highly non-linear nature of the auditory processing in the human brain. We present a novel Long Short-Term Memory (LSTM)-based architecture as a nonlinear model for the classification problem of whether a given pair of (EEG, speech envelope) correspond to each other or not. The model maps short segments of the EEG and the envelope to a common embedding space using a CNN in the EEG path and an LSTM in the speech path. The latter also compensates for the brain response delay. In addition, we use transfer learning to fine-tune the model for each subject. The mean classification accuracy of the proposed model reaches 85%, which is significantly higher than that of a state of the art Convolutional Neural Network (CNN)-based model (73%) and the linear model (69%).

show abstract

An exemplar-based NMF approach to audio event detection

Gemmeke

Vuegen

Karsmakers

et al. 2013

View full text Add to dashboard Cite

We present a novel, exemplar-based method for audio event detection based on non-negative matrix factorisation. Building on recent work in noise robust automatic speech recognition, we model events as a linear combination of dictionary atoms, and mixtures as a linear combination of overlapping events. The weights of activated atoms in an observation serve directly as evidence for the underlying event classes. The atoms in the dictionary span multiple frames and are created by extracting all possible fixed-length exemplars from the training data. To combat data scarcity of small training datasets, we propose to artificially augment the amount of training data by linear time warping in the feature domain at multiple rates. The method is evaluated on the Office Live and Office Synthetic datasets released by the AASP Challenge on Detection and Classification of Acoustic Scenes and Events.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hugo Van hamme

Parametric identification of transfer functions in the frequency domain: a survey

Compressive Sensing for Missing Data Imputation in Noise Robust Speech Recognition

The interpolated fast Fourier transform: a comparative study

Parametric identification of transfer functions in the frequency domain-a survey

Modeling the relationship between acoustic stimulus and EEG with a dilated convolutional neural network

A Review of Signal Subspace Speech Enhancement and Its Application to Noise Robust Speech Recognition

An LSTM Based Architecture to Relate Speech Stimulus to Eeg

An exemplar-based NMF approach to audio event detection

Contact Info

Product

Resources

About