2021 IEEE Spoken Language Technology Workshop (SLT) 2021
DOI: 10.1109/slt48900.2021.9383522
|View full text |Cite
|
Sign up to set email alerts
|

Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
10

Relationship

4
6

Authors

Journals

citations
Cited by 33 publications
(14 citation statements)
references
References 35 publications
0
14
0
Order By: Relevance
“…We train separation networks using the same architecture as previous works [6,8,9,10], which separates sources by masking in a learned transform domain. The network is composed of a learnable encoder/decoder with 2.5 ms window and 1.25 ms hop, com-bined with a time-domain convolutional network (TDCN++).…”
Section: Methodsmentioning
confidence: 99%
“…We train separation networks using the same architecture as previous works [6,8,9,10], which separates sources by masking in a learned transform domain. The network is composed of a learnable encoder/decoder with 2.5 ms window and 1.25 ms hop, com-bined with a time-domain convolutional network (TDCN++).…”
Section: Methodsmentioning
confidence: 99%
“…Our training configurations are illustrated in Figure 1. For supervised data, we use anechoic and reverberant versions of Libri2Mix [19,20]. The anechoic version is the official clean two-speaker mixtures, and the reverberant version RLibri2Mix [13] uses synthetic impulse responses using a simulator described in previous work [20].…”
Section: Experiments Setupmentioning
confidence: 99%
“…The room simulation is based on the image method with frequency-dependent wall filters and is described in [24]. A simulated room with width between 3-7 meters, length between 4-8 meters, and height between 2.13-3.05 meters is sampled for each mixture, with a random microphone location, and the sources in the clip are each convolved with an impulse response from a different randomly sampled location within the simulated room.…”
Section: Data Preparationmentioning
confidence: 99%