ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9747215
|View full text |Cite
|
Sign up to set email alerts
|

ICASSP 2022 Acoustic Echo Cancellation Challenge

Abstract: The ICASSP 2023 Speech Signal Improvement Challenge is intended to stimulate research in the area of improving the speech signal quality in communication systems. The speech signal quality can be measured with SIG in ITU-T P.835 and is still a top issue in audio communication and conferencing systems. For example, in the ICASSP 2022 Deep Noise Suppression challenge, the improvement in the background and overall quality is impressive, but the improvement in the speech signal is statistically zero. To improve th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
36
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 53 publications
(40 citation statements)
references
References 37 publications
1
36
0
Order By: Relevance
“…For example, the recent deep noise suppression challenges [3] target at speech enhancement in a monaural teleconferencing setup, requiring a processing latency less than 40 ms on a specified Intel i5 processor. Similar latency requirements exist in other related challenges [4], [5]. The recent Clarity challenge [6] aims at multi-microphone speech enhancement in a hearing aid setup, requiring an algorithmic latency of at maximum 5 ms.…”
Section: Introductionmentioning
confidence: 92%
“…For example, the recent deep noise suppression challenges [3] target at speech enhancement in a monaural teleconferencing setup, requiring a processing latency less than 40 ms on a specified Intel i5 processor. Similar latency requirements exist in other related challenges [4], [5]. The recent Clarity challenge [6] aims at multi-microphone speech enhancement in a hearing aid setup, requiring an algorithmic latency of at maximum 5 ms.…”
Section: Introductionmentioning
confidence: 92%
“…For training data, we created a system id dataset by convolving the far-end speech recordings from the single-talk portion of the Microsoft AEC Challenge [80] with room impulse responses (RIRs) from [81]. At test time, we truncate all RIRs to 1024 taps.…”
Section: B Experimental Designmentioning
confidence: 99%
“…When averaging, we discard silent frames using an energy-threshold VAD. In scenes with near-end speech, we use STOI ∈ [0, 1] to measure the preservation of near-end With respect to datasets for single-talk, double-talk, and double-talk with path-change experiments, we re-mix the synthetic fold of [80] with impulse responses from [81]. We partition [81] into non-overlapping train, test, and validation folds and set the signal-to-echo-ratio randomly between [−10, 10] with uniform distribution.…”
Section: B Experimental Designmentioning
confidence: 99%
See 1 more Smart Citation
“…1. Again, AEC and PF are trained in two separate steps, now, however, on the ICASSP 2022 AEC Challenge synthetic FB dataset [24], further referenced as Dsyn, which also consists of 10,000 files of 10 s length. ).…”
Section: Training: Wideband Aec and Pfmentioning
confidence: 99%