Non-linear mapping for multi-channel speech separation and robust overlapping spech recognition

Liu, Weifeng; Dines, John; Magimai.-Doss, Mathew; Bourlard, Hervé

doi:10.1109/icassp.2009.4960485

Cited by 5 publications

(3 citation statements)

References 12 publications

(13 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The mapping is performed between features extracted from noisy and clean speech signals to obtain an optimal set of parameters through the error backpropagation algorithm [52,53,54]. The goal is to obtain clean or enhanced speech from the noisy input via a nonlinear transformation using neural networks such as a deep denoising autoencoder [55] or a multilayer perceptron (MLP) [56].…”

Section: Feature Mapping Techniques Using Dnnmentioning

confidence: 99%

Feature mapping using far-field microphones for distant speech recognition

Himawan

Motlíček

Sridharan

2016

Speech Communication

View full text Add to dashboard Cite

Acoustic modeling based on deep architectures has recently gained remarkable success, with substantial improvement of speech recognition accuracy in several automatic speech recognition (ASR) tasks. For distant speech recognition, the multi-channel deep neural network based approaches rely on the powerful modeling capability of deep neural network (DNN) to learn suitable representation of distant speech directly from its multi-channel source. In this model-based combination of multiple microphones, features from each channel are concatenated and used together as an input to DNN. This allows integrating the multi-channel audio for acoustic modeling without any pre-processing steps. Despite powerful modeling capabilities of DNN, an environmental mismatch due to noise and reverberation may result in severe performance degradation when features are simply fed to a DNN without a feature enhancement step. In this paper, we introduce the nonlinear bottleneck feature mapping approach using DNN, to transform the noisy and reverberant features to its clean version. The bottleneck features trained on clean signal are used as a teacher signal because they contain relevant information to phoneme classification, and the mapping is performed with the objective of suppressing noise and reverberation. The individual and combined impacts of beamforming and speaker adaptation techniques along with the feature mapping are examined for distant large vocabulary speech recognition, using a single and multiple far-field microphones. As an alternative to beamforming, experiments with concatenating multiple channel features are conducted. The experimental results on the AMI meeting corpus show that the feature mapping, used in combination with beamforming and speaker adaptation yields a distant speech recognition performance below 50% word error rate (WER), using DNN for acoustic modeling.

show abstract

Section: Feature Mapping Techniques Using Dnnmentioning

confidence: 99%

Feature mapping using far-field microphones for distant speech recognition

Himawan

Motlíček

Sridharan

2016

Speech Communication

View full text Add to dashboard Cite

show abstract

“…In the Pascal Speech Separation Challenge [6], recognizing a target speech in the presence of another talker's speech was evaluated in a monaural scenario. A multichannel approach has also been studied [7]- [11] because it is more effective for speech separation.…”

Section: Introductionmentioning

confidence: 99%

Multi-Talker Speech Recognition Based on Blind Source Separation with ad hoc Microphone Array Using Smartphones and Cloud Storage

2016

View full text Add to dashboard Cite

In this paper, we present a multi-talker speech recognition system based on blind source separation with an ad hoc microphone array, which consists of smartphones and cloud storage. In this system, a mixture of voices from multiple speakers is recorded by each speaker's smartphone, which is automatically transferred to online cloud storage. Our prototype system is realized using iPhone and Dropbox. Although the signals recorded by different iPhones are not synchronized, the blind synchronization technique compensates both the differences in the time offset and the sampling frequency mismatch. Then, auxiliary-function-based independent vector analysis separates the synchronized mixture into each speaker's voice. Finally, automatic speech recognition is applied to transcribe the speech. By experimental evaluation of the multi-talker speech recognition system using Julius, we confirm that it effectively reduces the speech overlap and improves the speech recognition performance.

show abstract

“…recognizing speech from multiple distant microphones (multichannel) for multiparty meetings where more than one speaker can be active at the same time). The basic idea [210]- [211] to achieve this is to find a mapping (by a neural network or some regression analysis) between the log FBEs of signals from distant microphones and the log FBEs of clean signal. We therefore expect that the FBEs provide a reasonably effective and discriminative representation space of the speech signal towards differentiating the effects of noise injection and noise-suppression (i.e.…”

Section: Suppressed Speechmentioning

confidence: 99%

Learning based signal quality assessment for multimedia communications

Narwaria¹

View full text Add to dashboard Cite

Multimedia contents (including image/video, speech, audio, graphic and so on) can be affected by a wide variety of distortions during the process of acquisition, compression, processing, transmission, and reproduction which generally leads to loss of perceptual quality. As a result, signal quality assessment is an important component in today's multimedia communication systems. In this thesis, perceptual quality assessment algorithms are proposed for three important types of multimedia signals, namely image, video, and speech. This involves two crucial stages: (a) feature extraction/detection, and (b) feature pooling.

show abstract

Non-linear mapping for multi-channel speech separation and robust overlapping spech recognition

Cited by 5 publications

References 12 publications

Feature mapping using far-field microphones for distant speech recognition

Feature mapping using far-field microphones for distant speech recognition

Multi-Talker Speech Recognition Based on Blind Source Separation with ad hoc Microphone Array Using Smartphones and Cloud Storage

Learning based signal quality assessment for multimedia communications

Contact Info

Product

Resources

About