RemixIT: Continual Self-Training of Speech Enhancement Models via Bootstrapped Remixing

Tzinis, Efthymios; Adi, Yossi; Ithapu, Vamsi Krishna; Xu, Buye; Smaragdis, Paris; Kumar, Anurag

doi:10.1109/jstsp.2022.3200911

Cited by 22 publications

(8 citation statements)

References 55 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The problem has been also tackled from the perspective of generative modeling, with the use of variational auto-encoders [9], [37]. Finally, teacher-student training schemes have been employed, in which an OOD teacher model is used to provide targets for supervised training of a student model on target data [38], [39]. Although numerous approaches are available, no systematic comparison on a common data corpus has been done in the literature.…”

Section: Related Workmentioning

confidence: 99%

“…Although numerous approaches are available, no systematic comparison on a common data corpus has been done in the literature. In this article, we compare our method to noisy-target training (Nytt) [24], and RemixIT [38], a recent teacher-student training scheme.…”

Section: Related Workmentioning

confidence: 99%

“…RemixIT [38] is a teacher-student training scheme, i.e. we assume having a teacher model f (te) and we use it to train a student model f (st) .…”

Section: B Remixit Baselinementioning

confidence: 99%

See 2 more Smart Citations

Masked Spectrogram Prediction for Unsupervised Domain Adaptation in Speech Enhancement

Zmolikova,

Pedersen,

Jensen

2024

IEEE Open J. Signal Process.

View full text Add to dashboard Cite

Supervised learning-based speech enhancement methods often work remarkably well in acoustic situations represented in the training corpus but generalize poorly to out-of-domain situations, i.e. situations not seen during training. This stands in the way of further improvement of these methods in realistic scenarios, as collecting paired noisy-clean recordings in the target application domain is typically not feasible. Recording noisy-only in-domain data is, though, much more practical. In this article, we tackle the problem of unsupervised domain adaptation in speech enhancement. Specifically, we propose a way to use in-domain noisy-only data in the training of a neural network to improve upon a model trained solely on out-of-domain paired data. For this, we make use of masked spectrogram prediction, a technique from self-supervised learning that aims to interpolate masked regions of a spectrogram. We hypothesize that masked spectrogram prediction encourages learning of features that represent well both speech and noise components of the noisy signals. These features can then be used to train a more robust speech enhancement system. We evaluate the proposed method on the VoiceBank-DEMAND and LibriFSD50k databases, with WSJ0-CHiME3 serving as the out-of-domain database. We show that the proposed method outperforms both the out-of-domain system and the baseline approaches, i.e. RemixIT and noisy-target training, and also combines well with the previously proposed RemixIT method.INDEX TERMS Masked spectrogram prediction, speech enhancement, unsupervised domain adaptation.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Masked Spectrogram Prediction for Unsupervised Domain Adaptation in Speech Enhancement

Zmolikova,

Pedersen,

Jensen

2024

IEEE Open J. Signal Process.

View full text Add to dashboard Cite

show abstract

“…In addition, unsupervised or semi-supervised training methods have also been investigated to achieve a general solution even on out-of-distribution datasets. A particularly interesting method is the teacher-student training method proposed in RemixIT [36] where a teacher network trained on out-of-distribution data is used to bootstrap the noisy signals to multiply the variety of in-distribution data samples.…”

Section: Current State-of-the-art Solutionsmentioning

confidence: 99%

“…Methodologies inspired by conventional deep learning, e.g. multi-timescale networks [29,31,36] or attention [28], if mapped efficiently to the neuromorphic domain, could be promising directions as well. And finally for completeness-to address the third question posed above-decoding the output of the neuromorphically-processed audio again depends on the processing used and must be tailored appropriately to operate in an efficient manner.…”

Section: Neuromorphic Audio Processing and Promising Directionsmentioning

confidence: 99%

The Intel neuromorphic DNS challenge

Timcheck

Shrestha

Rubin

et al. 2023

Neuromorph. Comput. Eng.

View full text Add to dashboard Cite

A critical enabler for progress in neuromorphic computing research is the ability to transparently evaluate different neuromorphic solutions on important tasks and to compare them to state-of-the-art conventional solutions. The Intel Neuromorphic Deep Noise Suppression Challenge (Intel N-DNS Challenge), inspired by the Microsoft DNS Challenge, tackles a ubiquitous and commercially relevant task: real-time audio denoising. Audio denoising is likely to reap the benefits of neuromorphic computing due to its low-bandwidth, temporal nature and its relevance for low-power devices. The Intel N-DNS Challenge consists of two tracks: a simulation-based algorithmic track to encourage algorithmic innovation, and a neuromorphic hardware (Loihi 2) track to rigorously evaluate solutions. For both tracks, we specify an evaluation methodology based on energy, latency, and resource consumption in addition to output audio quality. We make the Intel N-DNS Challenge dataset scripts and evaluation code freely accessible, encourage community participation with monetary prizes, and release a neuromorphic baseline solution which shows promising audio quality, high power efficiency, and low resource consumption when compared to Microsoft NsNet2 and a proprietary Intel denoising model used in production. We hope the Intel N-DNS Challenge will hasten innovation in neuromorphic algorithms research, especially in the area of training tools and methods for real-time signal processing. We expect the winners of the challenge will demonstrate that for problems like audio denoising, significant gains in power and resources can be realized on neuromorphic devices available today compared to conventional state-of-the-art solutions.

show abstract

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

Ochieng

2023

Artif Intell Rev

View full text Add to dashboard Cite

Deep neural networks (DNN) techniques have become pervasive in domains such as natural language processing and computer vision. They have achieved great success in tasks such as machine translation and image generation. Due to their success, these data driven techniques have been applied in audio domain. More specifically, DNN models have been applied in speech enhancement and separation to perform speech denoising, dereverberation, speaker extraction and speaker separation. In this paper, we review the current DNN techniques being employed to achieve speech enhancement and separation. The review looks at the whole pipeline of speech enhancement and separation techniques from feature extraction, how DNN-based tools models both global and local features of speech, model training (supervised and unsupervised) to how they address label ambiguity problem. The review also covers the use of domain adaptation techniques and pre-trained models to boost speech enhancement process. By this, we hope to provide an all inclusive reference of all the state of art DNN based techniques being applied in the domain of speech separation and enhancement. We further discuss future research directions. This survey can be used by both academic researchers and industry practitioners working in speech separation and enhancement domain.

show abstract

RemixIT: Continual Self-Training of Speech Enhancement Models via Bootstrapped Remixing

Cited by 22 publications

References 55 publications

Masked Spectrogram Prediction for Unsupervised Domain Adaptation in Speech Enhancement

Masked Spectrogram Prediction for Unsupervised Domain Adaptation in Speech Enhancement

The Intel neuromorphic DNS challenge

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

Contact Info

Product

Resources

About