“…In this typical two-stage arrangement for speech enhancement, the application of a DNN as second stage (RES and NR) gained increasing attention with early investigations of feed-forward networks [1,2], convolutional networks bringing further improvements more recently [3,4], some even being fully synergistic with the first stage [5], and many more. In the meantime, also fully learned deep AEC approaches were proposed, where a single network incorporates the tasks of AEC, RES, and NR, e.g., [6,7] or further investigated in [8]. Although showing an impressive suppression performance, however, these are often accompanied by some near-end speech degradation and-at least for now-hybrid approaches are still one step ahead as can be seen with the leading model of the AEC Challenge on ICASSP 2021 being a hybrid approach [9].…”