ICASSP 2022 Acoustic Echo Cancellation Challenge

Cutler, Ross; Saabas, Ando; Pärnamaa, Tanel; Purin, Marju; Gamper, Hannes; Braun, Sebastian; Sørensen, Karsten Engsig; Aichner, Robert

doi:10.1109/icassp43922.2022.9747215

Cited by 53 publications

(40 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, the recent deep noise suppression challenges [3] target at speech enhancement in a monaural teleconferencing setup, requiring a processing latency less than 40 ms on a specified Intel i5 processor. Similar latency requirements exist in other related challenges [4], [5]. The recent Clarity challenge [6] aims at multi-microphone speech enhancement in a hearing aid setup, requiring an algorithmic latency of at maximum 5 ms.…”

Section: Introductionmentioning

confidence: 92%

Improving Frame-Online Neural Speech Enhancement with Overlapped-Frame Prediction

Wang,

Watanabe

2022

Preprint

View full text Add to dashboard Cite

Frame-online speech enhancement systems in the short-time Fourier transform (STFT) domain usually have an algorithmic latency equal to the window size due to the use of the overlap-add algorithm in the inverse STFT (iSTFT). This algorithmic latency allows the enhancement models to leverage future contextual information up to a length equal to the window size. However, current frame-online systems only partially leverage this future information. To fully exploit this information, this study proposes an overlapped-frame prediction technique for deep learning based frame-online speech enhancement, where at each frame our deep neural network (DNN) predicts the current and several past frames that are necessary for overlap-add, instead of only predicting the current frame. In addition, we propose a novel loss function to account for the scale difference between predicted and oracle target signals. Evaluations results on a noisyreverberant speech enhancement task show the effectiveness of the proposed algorithms.

show abstract

Section: Introductionmentioning

confidence: 92%

Improving Frame-Online Neural Speech Enhancement with Overlapped-Frame Prediction

Wang,

Watanabe

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…For training data, we created a system id dataset by convolving the far-end speech recordings from the single-talk portion of the Microsoft AEC Challenge [80] with room impulse responses (RIRs) from [81]. At test time, we truncate all RIRs to 1024 taps.…”

Section: B Experimental Designmentioning

confidence: 99%

“…When averaging, we discard silent frames using an energy-threshold VAD. In scenes with near-end speech, we use STOI ∈ [0, 1] to measure the preservation of near-end With respect to datasets for single-talk, double-talk, and double-talk with path-change experiments, we re-mix the synthetic fold of [80] with impulse responses from [81]. We partition [81] into non-overlapping train, test, and validation folds and set the signal-to-echo-ratio randomly between [−10, 10] with uniform distribution.…”

Section: B Experimental Designmentioning

confidence: 99%

See 1 more Smart Citation

Meta-AF: Meta-Learning for Adaptive Filters

Casebeer¹,

Bryan²,

Smaragdis³

2022

Preprint

View full text Add to dashboard Cite

Adaptive filtering algorithms are pervasive throughout modern society and have had a significant impact on a wide variety of domains including audio processing, telecommunications, biomedical sensing, astropyhysics and cosmology, seismology, and many more. Adaptive filters typically operate via specialized online, iterative optimization methods such as least-mean squares or recursive least squares and aim to process signals in unknown or nonstationary environments. Such algorithms, however, can be slow and laborious to develop, require domain expertise to create, and necessitate mathematical insight for improvement. In this work, we seek to go beyond the limits of human-derived adaptive filter algorithms and present a comprehensive framework for learning online, adaptive signal processing algorithms or update rules directly from data. To do so, we frame the development of adaptive filters as a metalearning problem in the context of deep learning and use a form of self-supervision to learn online iterative update rules for adaptive filters. To demonstrate our approach, we focus on audio applications and systematically develop meta-learned adaptive filters for five canonical audio problems including system identification, acoustic echo cancellation, blind equalization, multichannel dereverberation, and beamforming. For each application, we compare against common baselines and/or current state-ofthe-art methods and show we can learn high-performing adaptive filters that operate in real-time and, in most cases, significantly out perform all past specially developed methods for each task using a single general-purpose configuration of our method.

show abstract

“…1. Again, AEC and PF are trained in two separate steps, now, however, on the ICASSP 2022 AEC Challenge synthetic FB dataset [24], further referenced as Dsyn, which also consists of 10,000 files of 10 s length. ).…”

Section: Training: Wideband Aec and Pfmentioning

confidence: 99%

Bandwidth-Scalable Fully Mask-Based Deep FCRN Acoustic Echo Cancellation and Postfiltering

Seidel¹,

Olsson²,

Haddad³

et al. 2022

Preprint

View full text Add to dashboard Cite

Although today's speech communication systems support various bandwidths from narrowband to super-wideband and beyond, stateof-the art DNN methods for acoustic echo cancellation (AEC) are lacking modularity and bandwidth scalability. Our proposed DNN model builds upon a fully convolutional recurrent network (FCRN) and introduces scalability over various bandwidths up to a fullband (FB) system (48 kHz sampling rate). This modular approach allows joint wideband (WB) pre-training of mask-based AEC and postfilter stages with dedicated losses, followed by separate trainings of them on FB data. A third lightweight blind bandwidth extension stage is separately trained on FB data, flexibly allowing to extend the WB postfilter output towards higher bandwidths until reaching FB. Thereby, higher frequency noise and echo are reliably suppressed. On the ICASSP 2022 Acoustic Echo Cancellation Challenge blind test set we report a competitive performance, showing robustness even under highly delayed echo and dynamic echo path changes.

show abstract

ICASSP 2022 Acoustic Echo Cancellation Challenge

Cited by 53 publications

References 37 publications

Improving Frame-Online Neural Speech Enhancement with Overlapped-Frame Prediction

Improving Frame-Online Neural Speech Enhancement with Overlapped-Frame Prediction

Meta-AF: Meta-Learning for Adaptive Filters

Bandwidth-Scalable Fully Mask-Based Deep FCRN Acoustic Echo Cancellation and Postfiltering

Contact Info

Product

Resources

About