2020
DOI: 10.3390/s20071883
|View full text |Cite
|
Sign up to set email alerts
|

Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments

Abstract: In this paper, we propose joint optimization of deep neural network (DNN)-supported dereverberation and beamforming for the convolutional recurrent neural network (CRNN)-based sound event detection (SED) in multi-channel environments. First, the short-time Fourier transform (STFT) coefficients are calculated from multi-channel audio signals under the noisy and reverberant environments, which are then enhanced by the DNN-supported weighted prediction error (WPE) dereverberation with the estimated masks. Next, t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
10
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(10 citation statements)
references
References 34 publications
0
10
0
Order By: Relevance
“…On the other hand, several sound events of short duration (e.g., "object impact" and "keyboard typing") have more than one audio pattern, such as attack, decay, and release parts. To control the training weight of the sound event model in accordance with the ease/difficulty of model training, the use of focal loss has been proposed [19,20]. In this paper, we newly introduce the following asymmetric focal loss (AFL), which enables the control of the focusing factor of active and inactive frames separately.…”
Section: Asymmetric Focal Loss (Afl)mentioning
confidence: 99%
“…On the other hand, several sound events of short duration (e.g., "object impact" and "keyboard typing") have more than one audio pattern, such as attack, decay, and release parts. To control the training weight of the sound event model in accordance with the ease/difficulty of model training, the use of focal loss has been proposed [19,20]. In this paper, we newly introduce the following asymmetric focal loss (AFL), which enables the control of the focusing factor of active and inactive frames separately.…”
Section: Asymmetric Focal Loss (Afl)mentioning
confidence: 99%
“…However, the performances of TDNNs have not been investigated using sounds in different acoustic environments. To enhance the performance of sound event detection, other DNN-based acoustic techniques such as noise reduction [ 43 ], and dereverberation and beamforming [ 44 ] have been investigated. In our study for AER and ICL, the effectiveness of transfer learning was demonstrated using TDNNs pre-trained on the sounds of AudioSet to extract the embedding of acoustic events.…”
Section: Introductionmentioning
confidence: 99%
“…Both approaches are mainly based on estimating the time difference of arrival (TDOA) obtained by using various configurations of microphone arrays, such as linear array [12], circular array [13], or distributed array [14] and different cross-correlation algorithm to estimate time lag between microphones. The first approach aims to maximize the steered response power (SRP) of the output of a delay and sum beamformer [15]. This direct approach performs an exhaustive search in the whole SRP space to find a sound location, which is found to be computationally expensive.…”
Section: Introductionmentioning
confidence: 99%