AEC in A Netshell: on Target and Topology Choices for FCRN Acoustic Echo Cancellation

Franzen, Jan; Seidel, Ernst; Fingscheidt, Tim

doi:10.1109/icassp39728.2021.9414715

Cited by 17 publications

(7 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We use three measures to evaluate all approaches: Speech quality in terms of wideband PESQ MOS LQO [26,27], SNR improvement (∆SNR) in [dB] for noise reduction, and echo suppression by echo return loss enhancement (ERLE) in [dB] computed as in [8].…”

Section: Resultsmentioning

confidence: 99%

“…The following rows provide two approaches as baseline comparisons: First, a single-stage 'all-in-one' approach trains the FCRN to perform all tasks (AEC+RES+NR) at once, as proposed in [6] using direct estimation and further elaborated in [8]. As for the Kalman filter, the only available inputs to this approach are the microphone signal Y and reference signal X (for brevity denoted as vectors without frame and bin index here).…”

Section: Resultsmentioning

confidence: 99%

“…The acoustic setup is outlined in Figure 1 and shall cover typical single-and double-talk scenarios. Following the procedure described in [8,6], the TIMIT dataset [15] is used to set up far-end speech x(n) and near-end speech s(n) at a sampling rate of 16 kHz. At the microphone, near-end speech s(n) is superimposed with background noise n(n) and echo signals d(n).…”

Section: System Model and Network Topologymentioning

confidence: 99%

“…Echoes d(n) are generated by imposing loudspeaker nonlinearities [6] on far-end signals x(n) and applying impulse responses (IRs) created with the image method [16]. As in related work [8], IRs are set to a length of 512 samples with reverberation times T60 ∈ {0.2, 0.3, 0.4} s for training and validation, and 0.2 s for test mixtures. Background noise n(n) from the QUT dataset [17] is used for training and validation.…”

Section: System Model and Network Topologymentioning

confidence: 99%

“…In this typical two-stage arrangement for speech enhancement, the application of a DNN as second stage (RES and NR) gained increasing attention with early investigations of feed-forward networks [1,2], convolutional networks bringing further improvements more recently [3,4], some even being fully synergistic with the first stage [5], and many more. In the meantime, also fully learned deep AEC approaches were proposed, where a single network incorporates the tasks of AEC, RES, and NR, e.g., [6,7] or further investigated in [8]. Although showing an impressive suppression performance, however, these are often accompanied by some near-end speech degradation and-at least for now-hybrid approaches are still one step ahead as can be seen with the leading model of the AEC Challenge on ICASSP 2021 being a hybrid approach [9].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Deep Residual Echo Suppression and Noise Reduction: A Multi-Input FCRN Approach in a Hybrid Speech Enhancement System

Franzen¹,

Fingscheidt²

2021

Preprint

Self Cite

Get access via publisher Add to dashboard Cite

Exaggerated anticipatory anxiety is common in social anxiety disorder (SAD). Neuroimaging studies have revealed altered neural activity in response to social stimuli in SAD, but fewer studies have examined neural activity during anticipation of feared social stimuli in SAD. The current study examined the time course and magnitude of activity in threat processing brain regions during speech anticipation in socially anxious individuals and healthy controls (HC). Method Participants (SAD n = 58; HC n = 16) underwent functional magnetic resonance imaging (fMRI) during which they completed a 90s control anticipation task and 90s speech anticipation task.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Section: System Model and Network Topologymentioning

confidence: 99%