2020
DOI: 10.48550/arxiv.2005.07631
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Nonlinear Residual Echo Suppression Based on Multi-stream Conv-TasNet

Abstract: Acoustic echo cannot be entirely removed by linear adaptive filters due to the nonlinear relationship between the echo and farend signal. Usually a post processing module is required to further suppress the echo. In this paper, we propose a residual echo suppression method based on the modification of fully convolutional time-domain audio separation network (Conv-TasNet). Both the residual signal of the linear acoustic echo cancellation system, and the output of the adaptive filter are adopted to form multiple… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 23 publications
0
2
0
Order By: Relevance
“…These distortions can vary at different volumes, temperatures and between different loudspeakers. A common practice is to apply a functional non-linearity to mimic loudspeaker distortion as in [18]. Of course, the highest fidelity way to capture these effects in training data is to record echoed reference outputs in real rooms, but this has the considerable downside of being expensive and time consuming.…”
Section: Data Preparationmentioning
confidence: 99%
“…These distortions can vary at different volumes, temperatures and between different loudspeakers. A common practice is to apply a functional non-linearity to mimic loudspeaker distortion as in [18]. Of course, the highest fidelity way to capture these effects in training data is to record echoed reference outputs in real rooms, but this has the considerable downside of being expensive and time consuming.…”
Section: Data Preparationmentioning
confidence: 99%
“…Lately, the objective deep noise suppression mean opinion score (DNSMOS) metric has been proposed to estimate human ratings and has shown great accuracy [12]. Regarding the task of RES, speech quality during double-talk is traditionally evaluated using the objective signal-to-distortion ratio (SDR) metric [13], e.g., in [14][15][16][17][18][19]. Unfortunately, the SDR is affected by both desired-speech distortion and residual-echo presence, which renders it unreliable in predicting the DNSMOS and unreliable in predicting human perception of speech quality [12].…”
Section: Introductionmentioning
confidence: 99%