2020 28th European Signal Processing Conference (EUSIPCO) 2021
DOI: 10.23919/eusipco47968.2020.9287436
|View full text |Cite
|
Sign up to set email alerts
|

Foreground-Background Ambient Sound Scene Separation

Abstract: Ambient sound scenes typically comprise multiple short events occurring on top of a somewhat stationary background. We consider the task of separating these events from the background, which we call foreground-background ambient sound scene separation. We propose a deep learningbased separation framework with a suitable feature normalization scheme and an optional auxiliary network capturing the background statistics, and we investigate its ability to handle the great variety of sound classes encountered in am… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 19 publications
0
3
0
Order By: Relevance
“…There are two types of sound separation systems. The first type of system uses as many outputs as the number of possible SE classes [25], [26], [28], [32]. Such systems perform both separation and identification and could thus be used directly for TSE.…”
Section: A Source Separationmentioning
confidence: 99%
“…There are two types of sound separation systems. The first type of system uses as many outputs as the number of possible SE classes [25], [26], [28], [32]. Such systems perform both separation and identification and could thus be used directly for TSE.…”
Section: A Source Separationmentioning
confidence: 99%
“…[38,39]). Consequently, a variety of solutions have also been explored, including the use of transfer learning following a pretraining using general-purpose datasets [37,40,41], source separation and denoising [42,43], discriminative training [44], domain adaptation based on covariance normalization [45], unsupervised domain adaptation [46,47], domain/context adaptive neural networks [36,48], etc. Use of data augmentation techniques to address domain mismatch problems was common among the top-performing solutions in a public challenge focused on developing generalizable methods for detecting birds [49].…”
Section: Introductionmentioning
confidence: 99%
“…This enables the representation of non-stationary audio waveforms, by their nature, as stationary frames of the signal. In commonly used methods for audio signal processing such as speech, music genre, animal, or ambient sound recognition, the typical frame length is from 10 ms to 40 ms [15][16][17][18][19].…”
Section: Introductionmentioning
confidence: 99%