ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9053921
|View full text |Cite
|
Sign up to set email alerts
|

Improving Universal Sound Separation Using Sound Classification

Abstract: Deep learning approaches have recently achieved impressive performance on both audio source separation and sound classification. Most audio source separation approaches focus only on separating sources belonging to a restricted domain of source classes, such as speech and music. However, recent work has demonstrated the possibility of "universal sound separation", which aims to separate acoustic sources from an open domain, regardless of their class. In this paper, we utilize the semantic information learned b… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
48
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 60 publications
(48 citation statements)
references
References 22 publications
0
48
0
Order By: Relevance
“…Tzinis et al [4] performed separation experiments with a fixed number of sources on the 50-class ESC-50 dataset [5]. Other papers have leveraged information about sound class, either as conditioning information or as as a weak supervision signal [6,2,7].…”
Section: Relation To Prior Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Tzinis et al [4] performed separation experiments with a fixed number of sources on the 50-class ESC-50 dataset [5]. Other papers have leveraged information about sound class, either as conditioning information or as as a weak supervision signal [6,2,7].…”
Section: Relation To Prior Workmentioning
confidence: 99%
“…Finally, we have released a masking-based baseline separation model, based on an improved time-domain convolutional network (TDCN++), described in our recent publications [1,2]. On the FUSS test set, this model achieves 9.8 dB of scale-invariant signal-to-noise ratio improvement (SI-SNRi) on mixtures with two to four sources, while reconstructing single-source inputs with 35.8 dB SI-SNR.…”
Section: Introductionmentioning
confidence: 99%
“…Kavalerov et al [1] have recently shown that by constructing a suitable dataset of mixtures of sounds, one can obtain SI-SDR (scale-invariant SDR) improvements of up to 10 dB. These results can be further improved by conditioning the sound separation model on the embeddings computed by a pre-trained audio classifier, which can also be fine-tuned [9]. More recently, a completely unsupervised approach to universal sound separation was developed [10], which does not require any single-source ground truth data, but instead can be trained using only real-world audio mixtures.…”
Section: Related Workmentioning
confidence: 99%
“…The PASE loss term enforces the frame permutations to align with the best utterance permutation π * u . Previous works also show that conditioning source separation models with additional features improves performance [25][26][27], but whether feature conditioning helps reducing permutation errors has yet to be confirmed. Towards this end, we extend the single-stage uPIT+PASE paradigm to a two-stage cascaded system.…”
Section: Upit + Pase For Conv-tasnetmentioning
confidence: 99%