Compute and memory efficient universal sound source separation

Tzinis, Efthymios; Wang, Zhepei; Jiang, Xilin; Smaragdis, Paris

doi:10.48550/arxiv.2103.02644

Cited by 2 publications

(2 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In future works, we will consider training the system on a larger dataset, which may include audio clips with multiple AE classes, or unlabeled data [8,9]. Moreover, we will also investigate other EL [13] to provide more discriminative embedding vectors and extend the model to perform online processing [30].…”

Section: Discussionmentioning

confidence: 99%

Few-Shot Learning of New Sound Classes for Target Sound Extraction

Delcroix

Vázquez²,

Ochiai

et al. 2021

Interspeech 2021

View full text Add to dashboard Cite

Target sound extraction consists of extracting the sound of a target acoustic event (AE) class from a mixture of AE sounds. It can be realized using a neural network that extracts the target sound conditioned on a 1-hot vector that represents the desired AE class. With this approach, embedding vectors associated with the AE classes are directly optimized for the extraction of sound classes seen during training. However, it is not easy to extend this framework to new AE classes, i.e. unseen during training. Recently, speech, music, or AE sound extraction based on enrollment audio of the desired sound offers the potential of extracting any target sound in a mixture given only a short audio signal of a similar sound. In this work, we propose combining 1-hot-and enrollment-based target sound extraction, allowing optimal performance for seen AE classes and simple extension to new classes. In experiments with synthesized sound mixtures generated with the Freesound Dataset (FSD) datasets, we demonstrate the benefit of the combined framework for both seen and new AE classes. Besides, we also propose adapting the embedding vectors obtained from a few enrollment audio samples (few-shot) to further improve performance on new classes.

show abstract

Section: Discussionmentioning

confidence: 99%

Few-Shot Learning of New Sound Classes for Target Sound Extraction

Delcroix

Vázquez²,

Ochiai

et al. 2021

Interspeech 2021

View full text Add to dashboard Cite

show abstract

“…We force the estimated sources to add up to the input mixture using a mixture consistency layer [28] at the output of our separation model. For all the other parameters we choose the default settings provided in [29] for a sampling rate of 16k Hz.…”

Section: Separation Modelmentioning

confidence: 99%

Separate But Together: Unsupervised Federated Learning for Speech Enhancement from Non-IID Data

Tzinis

Casebeer

Wang

et al. 2021

2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Self Cite

View full text Add to dashboard Cite

We propose FEDENHANCE, an unsupervised federated learning (FL) approach for speech enhancement and separation with non-IID distributed data across multiple clients. We simulate a realworld scenario where each client only has access to a few noisy recordings from a limited and disjoint number of speakers (hence non-IID). Each client trains their model in isolation using mixture invariant training while periodically providing updates to a central server. Our experiments show that our approach achieves competitive enhancement performance compared to IID training on a single device and that we can further facilitate the convergence speed and the overall performance using transfer learning on the server-side. Moreover, we show that we can effectively combine updates from clients trained locally with supervised and unsupervised losses. We also release a new dataset LibriFSD50K and its creation recipe in order to facilitate FL research for source separation problems.

show abstract

Compute and memory efficient universal sound source separation

Cited by 2 publications

References 33 publications

Few-Shot Learning of New Sound Classes for Target Sound Extraction

Few-Shot Learning of New Sound Classes for Target Sound Extraction

Separate But Together: Unsupervised Federated Learning for Speech Enhancement from Non-IID Data

Contact Info

Product

Resources

About