ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9053214
|View full text |Cite
|
Sign up to set email alerts
|

Speech Enhancement Using Self-Adaptation and Multi-Head Self-Attention

Abstract: This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features; we extract a speaker representation used for adaptation directly from the test utterance. Conventional studies of deep neural network (DNN)-based speech enhancement mainly focus on building a speaker independent model. Meanwhile, in speech applications including speech recognition and synthesis, it is known that model adaptation to the target speaker improves the accuracy. Our research question is wh… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
44
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
1
1

Relationship

2
8

Authors

Journals

citations
Cited by 106 publications
(47 citation statements)
references
References 30 publications
0
44
0
Order By: Relevance
“…Several prior works for speaker extraction have studied various cues about the target speaker, such as voiceprint [11,20,21], lip movement [12,22], facial appearance [23], and spatial information [13].…”
Section: Relation To Prior Workmentioning
confidence: 99%
“…Several prior works for speaker extraction have studied various cues about the target speaker, such as voiceprint [11,20,21], lip movement [12,22], facial appearance [23], and spatial information [13].…”
Section: Relation To Prior Workmentioning
confidence: 99%
“…The related works are described as follows. The attention blocks named self-attention were applied for speech enhancement [12,17]. The self-attention focused on the response of positions in a sequence while CCBAM focuses on cross-channel and spatial information of feature maps.…”
Section: Introductionmentioning
confidence: 99%
“…However, the corresponding noise segment in the same environment need to be prepared to create the noise embedding through an embedding subnetwork and the process of speech enhancement during inference is not convenient. Some research related to speaker-aware [13,14] or signal-to-noise-ratio (SNR) aware [15] algorithms have also been proposed to improve the speech enhancement model's denoising performance.…”
Section: Introductionmentioning
confidence: 99%