2017
DOI: 10.1109/taslp.2017.2672401
|View full text |Cite
|
Sign up to set email alerts
|

Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
82
0
1

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 199 publications
(83 citation statements)
references
References 28 publications
0
82
0
1
Order By: Relevance
“…Training of neural networks, which operate on the raw signals that are optimized for the discriminative cost function of the acoustic model, has also been recently explored. These approaches are termed as Neural Beamforming approaches as the neural network acoustic model subsumes the functionality of the beamformer [20,21].…”
Section: Retaled Prior Workmentioning
confidence: 99%
“…Training of neural networks, which operate on the raw signals that are optimized for the discriminative cost function of the acoustic model, has also been recently explored. These approaches are termed as Neural Beamforming approaches as the neural network acoustic model subsumes the functionality of the beamformer [20,21].…”
Section: Retaled Prior Workmentioning
confidence: 99%
“…Prior work has shown that learning a multi-channel front-end jointly with the AM using the ASR objective can improve far-field performances. In [8], Sainath et al showed that input from a data-driven multi-channel front-end provides better results than both single-channel and beamformed input. They introduce a set of convolutional filters applied directly to the raw audio [8].…”
Section: Introductionmentioning
confidence: 99%
“…In [8], Sainath et al showed that input from a data-driven multi-channel front-end provides better results than both single-channel and beamformed input. They introduce a set of convolutional filters applied directly to the raw audio [8]. The convolutional and linear structures are both designed to explicitly incorporate multiple beamformer "look directions", subsuming a multi-geometry beamforming component into the deep neural network (DNN).…”
Section: Introductionmentioning
confidence: 99%
“…In recent years, deep learning techniques have significantly improved speech recognition accuracy [4,5,6,7,8]. This improvement has come about from the shift from Gaussian Mixture Model (GMM) to the Feed-Forward Deep Neural Networks (FF-DNNs), FF-DNNs to Recurrent Neural Network (RNN) and in particular the Long Short-Term Memory (LSTM) networks [9].…”
Section: Introductionmentioning
confidence: 99%
“…LibriSpeech LM corpus. The best performance was achieved when the window length is 50 ms and the warping coefficients are uniformly distributed between 0 8. and 1.2.…”
mentioning
confidence: 99%