Ashish Alex scite author profile

Speech enhancement for drone audition is made challenging by the strong ego-noise from the rotating motors and propellers, which leads to extremely low signal-to-noise ratios (e.g. SNR < -15 dB) at onboard microphones. In this paper, we extensively assess the ability of single-channel deep learning approaches to ego-noise reduction on drones. We train twelve representative deep neural network (DNN) models, covering three operation domains (time-frequency magnitude domain, time-frequency complex domain and end-to-end time domain) and three distinct architectures (sequential, encoder-decoder and generative). We critically discuss and compare the performance of these models in extremely low-SNR scenarios, ranging from -30 to 0 dB. We show that time-frequency complex domain and UNet encoderdecoder architectures outperform other approaches on speech enhancement measures while providing a good trade-off with other criteria, such as model size, computation complexity and context length. Specifically, the best-performing model is DCUNet, a UNet model operating in the time-frequency complex domain, which, at input SNR -15 dB, improves ESTOI from 0.1 to 0.4, PESQ from 1.0 to 1.9 and SI-SDR from -15 dB to 3.7 dB. Based on the insights drawn from these findings, we discuss future research in drone ego-noise reduction.

show abstract

Mixup Augmentation for Generalizable Speech Separation

Alex

Wang

Gastaldo

et al. 2021

View full text Add to dashboard Cite

Deep learning has advanced the state of the art of single-channel speech separation. However, separation models may overfit the training data and generalization across datasets is still an open problem in real-world conditions with noise. In this paper we address the generalization problem with Mixup as data augmentation approach. Mixup creates new training examples from linear combinations of samples during mini-batch training.We propose four variations of Mixup and assess the improved generalization of a speech separation model, DPRNN, with cross-corpus evaluation on LibriMix, TIMIT and VCTK datasets. DPRNN allows efficient modelling of longer input sequences by splitting the learnt representation from input mixture segment into small chunks and performing intra and inter chunk operations iteratively. We show that training DPRNN with the proposed Data-only Mixup augmentation variation improves performance on an unseen dataset in noisy conditions when compared to the baseline SpecAugment augmented models, while having comparable performance on the source dataset.

show abstract

Data augmentation for speech separation

Alex

Wang

Gastaldo

et al. 2023

Speech Communication

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ashish Alex

Deep Learning Models for Single-Channel Speech Enhancement on Drones

Mixup Augmentation for Generalizable Speech Separation

Data augmentation for speech separation

Contact Info

Product

Resources

About