ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
DOI: 10.1109/icassp43922.2022.9747120
|View full text |Cite
|
Sign up to set email alerts
|

MANNER: Multi-View Attention Network For Noise Erasure

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 24 publications
(7 citation statements)
references
References 16 publications
0
3
0
Order By: Relevance
“…Specifically, we use the model and training setup defined in [50] and trained on the SC09 dataset 4 . For speech enhancement, we compare to MANNER [51], a recent high-performing speech enhancement model operating in the time-domain. Since we have no paired clean/noisy utterances for the SC09 dataset, we follow the technique from [52] to construct a speech enhancement dataset.…”
Section: B Baseline Systemsmentioning
confidence: 99%
See 1 more Smart Citation
“…Specifically, we use the model and training setup defined in [50] and trained on the SC09 dataset 4 . For speech enhancement, we compare to MANNER [51], a recent high-performing speech enhancement model operating in the time-domain. Since we have no paired clean/noisy utterances for the SC09 dataset, we follow the technique from [52] to construct a speech enhancement dataset.…”
Section: B Baseline Systemsmentioning
confidence: 99%
“…Unseen generative task performance of ASGAN compared to task-specific systems (AutoVC[50] for voice conversion, MANNER[51] for speech enhancement).…”
mentioning
confidence: 99%
“…MANNER [38] is an end-to-end multi-view attention network that currently ranks 6 th in terms of PESQ on the Voice-bank+DEMAND dataset 1 [79]. It presents a U-Net [72] -based architecture, whose blocks combine channel attention [80] with local and global attention along two signal scales similar to dual-path models [81].…”
Section: Mannermentioning
confidence: 99%
“…The generalization gap is then averaged across folds for a more accurate estimation. We use this framework to evaluate the influence of the speech, noise and room dimensions on the generalization performance of four speech enhancement systems: a FFNN-based system, Conv-TasNet [36], DCCRN [37] and MANNER [38]. Combined mismatches along multiple dimensions are also investigated.…”
Section: Introductionmentioning
confidence: 99%
“…Cao et al proposed a generative adversarial network to model temporal and frequency correlations and achieved extremely high performance [ 22 ]. Park et al proposed a multi-view attention network to improve the accuracy of feature extraction [ 23 ].…”
Section: Introductionmentioning
confidence: 99%