2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
DOI: 10.1109/icassp.2018.8461370
|View full text |Cite
|
Sign up to set email alerts
|

Multichannel Speech Separation with Recurrent Neural Networks from High-Order Ambisonics Recordings

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
20
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 35 publications
(21 citation statements)
references
References 20 publications
(29 reference statements)
0
20
0
Order By: Relevance
“…In this context, it is important to know the directions of arrival (DoAs) of the sounds, in order either to enhance the signals of interest or to reproduce the sound scene properly. For instance, DoA estimation is essential for speech enhancement and robust far-field automatic speech recognition in scenarios involving overlapping speakers [1]- [5].…”
Section: Introductionmentioning
confidence: 99%
“…In this context, it is important to know the directions of arrival (DoAs) of the sounds, in order either to enhance the signals of interest or to reproduce the sound scene properly. For instance, DoA estimation is essential for speech enhancement and robust far-field automatic speech recognition in scenarios involving overlapping speakers [1]- [5].…”
Section: Introductionmentioning
confidence: 99%
“…Heymann et al predicted TF masks out of a single signal of the microphone array [16]. Perotin et al [22] or Chakrabarty and Habets [21] included several other signals to improve the speech recognition or speech enhancement performance. We propose to extend these scenarios to the multi-node context of DANSE.…”
Section: Deep Neural Network Based Distributed Multichannel Wiener Fimentioning
confidence: 99%
“…This yields better results than single-channel prediction but combining all the sensor signals is not scalable and seems suboptimal because of the redundancy of the data. Coping with the redundancy, Perotin et al [22] combined a single estimate of the source signals with the input mixture and used the resulting tensor to train a long short-term memory (LSTM) recurrent neural network (RNN).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, speech enhancement is advanced by the use of a deep neural network (DNN) to estimate a T-F mask. For effectively modelling a speech signal which is timesequential data, a recurrent neural network (RNN) is used in various speech signal processing applications [1][2][3][4][5][6][7][8][9][10][11][12][13][14].…”
Section: Introductionmentioning
confidence: 99%