ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9413955
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Channel Speech Enhancement Using Graph Neural Networks

Abstract: Multi-channel speech enhancement aims to extract clean speech from a noisy mixture using signals captured from multiple microphones. Recently proposed methods tackle this problem by incorporating deep neural network models with spatial filtering techniques such as the minimum variance distortionless response (MVDR) beamformer. In this paper, we introduce a different research direction by viewing each audio channel as a node lying in a non-Euclidean space and, specifically, a graph. This formulation allows us t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
31
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 28 publications
(31 citation statements)
references
References 25 publications
0
31
0
Order By: Relevance
“…This type of data is the easiest to obtain, since a wide variety of voice types and physical setups can be generated instantly. Many machine learning baselines, e.g., [37,38,69], only train and evaluate on synthetic data generated in this manner. To generate the synthetic dataset, we create multi-speaker recordings in simulated environments with reverb and background noises.…”
Section: Training Methodologymentioning
confidence: 99%
See 1 more Smart Citation
“…This type of data is the easiest to obtain, since a wide variety of voice types and physical setups can be generated instantly. Many machine learning baselines, e.g., [37,38,69], only train and evaluate on synthetic data generated in this manner. To generate the synthetic dataset, we create multi-speaker recordings in simulated environments with reverb and background noises.…”
Section: Training Methodologymentioning
confidence: 99%
“…Multi-channel source separation and speech enhancement. Multichannel methods have been shown to perform better than their single-channel source separation counterparts [18,[34][35][36][37][38]. Binaural methods have also been used for source separation [39][40][41][42] and localization [43][44][45]; [40] reduces the look-ahead time in the network to make it causal in behavior but has not been demonstrated to run on a mobile device.…”
Section: Related Workmentioning
confidence: 99%
“…Another work, speech signals are represented as graphs to better capture the global feature representation for speech emotion recognition [21], where deep frame-level features are generated by an LSTM followed by a GNN to classify the graph representation of utterances. In another recent work [22], each audio channel is viewed as a node while constructing a speech graph for speech enhancement task. This allows for the discovery of spatial correlation among several channels.…”
Section: B Graph Neural Network In Audiomentioning
confidence: 99%
“…A systematic approach to design such networks is to use processing blocks that are number and permutation invariant, such as global pooling and self-attention [13]. Some recent works have investigated DNNs for ad-hoc array processing [11], [14]- [16]. Luo et al [14] proposed a novel transform-average-concatenate module to deal with unknown number and order, and graph neural networks were investigated for distributed arrays by Tzirakis et al [11].…”
Section: Introductionmentioning
confidence: 99%
“…Some recent works have investigated DNNs for ad-hoc array processing [11], [14]- [16]. Luo et al [14] proposed a novel transform-average-concatenate module to deal with unknown number and order, and graph neural networks were investigated for distributed arrays by Tzirakis et al [11]. Wang et al [17] proposed a spatio-temporal network where a recurrent network was used for temporal modeling and selfattention was used for spatial modeling.…”
Section: Introductionmentioning
confidence: 99%