2021
DOI: 10.48550/arxiv.2110.11844
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network

Abstract: Deep neural networks (DNNs) have been successfully used for multichannel speech enhancement in fixed array geometries. However, challenges remain for ad-hoc arrays with unknown microphone placements. We propose a deep neural network based approach for ad-hoc array processing: Triple-Attentive Dual-Recurrent Network (TADRN). TADRN uses self-attention across channels for learning spatial information and a dual-path attentive recurrent network (ARN) for temporal modeling. Temporal modeling is done independently f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 18 publications
0
1
0
Order By: Relevance
“…There are various methods to incorporate the effect of phase into speech enhancement. One approach is to directly process the time-domain speech signal [18] by framing it, not extracting the speech's time-frequency information, and using neural networks for speech enhancement in the time domain, directly generating sequences of speech. In frequency-domain speech enhancement, the complex speech spectrum can be taken as input for the speech enhancement network, utilizing a complex-valued speech enhancement network for enhancement [19].…”
Section: Introductionmentioning
confidence: 99%
“…There are various methods to incorporate the effect of phase into speech enhancement. One approach is to directly process the time-domain speech signal [18] by framing it, not extracting the speech's time-frequency information, and using neural networks for speech enhancement in the time domain, directly generating sequences of speech. In frequency-domain speech enhancement, the complex speech spectrum can be taken as input for the speech enhancement network, utilizing a complex-valued speech enhancement network for enhancement [19].…”
Section: Introductionmentioning
confidence: 99%