Speech enhancement for drone audition is made challenging by the strong ego-noise from the rotating motors and propellers, which leads to extremely low signal-to-noise ratios (e.g. SNR < -15 dB) at onboard microphones. In this paper, we extensively assess the ability of single-channel deep learning approaches to ego-noise reduction on drones. We train twelve representative deep neural network (DNN) models, covering three operation domains (time-frequency magnitude domain, time-frequency complex domain and end-to-end time domain) and three distinct architectures (sequential, encoder-decoder and generative). We critically discuss and compare the performance of these models in extremely low-SNR scenarios, ranging from -30 to 0 dB. We show that time-frequency complex domain and UNet encoderdecoder architectures outperform other approaches on speech enhancement measures while providing a good trade-off with other criteria, such as model size, computation complexity and context length. Specifically, the best-performing model is DCUNet, a UNet model operating in the time-frequency complex domain, which, at input SNR -15 dB, improves ESTOI from 0.1 to 0.4, PESQ from 1.0 to 1.9 and SI-SDR from -15 dB to 3.7 dB. Based on the insights drawn from these findings, we discuss future research in drone ego-noise reduction.
Deep learning has advanced the state of the art of single-channel speech separation. However, separation models may overfit the training data and generalization across datasets is still an open problem in real-world conditions with noise. In this paper we address the generalization problem with Mixup as data augmentation approach. Mixup creates new training examples from linear combinations of samples during mini-batch training.We propose four variations of Mixup and assess the improved generalization of a speech separation model, DPRNN, with cross-corpus evaluation on LibriMix, TIMIT and VCTK datasets. DPRNN allows efficient modelling of longer input sequences by splitting the learnt representation from input mixture segment into small chunks and performing intra and inter chunk operations iteratively. We show that training DPRNN with the proposed Data-only Mixup augmentation variation improves performance on an unseen dataset in noisy conditions when compared to the baseline SpecAugment augmented models, while having comparable performance on the source dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.