Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
2023
DOI: 10.1109/taslp.2022.3225649
|View full text |Cite
|
Sign up to set email alerts
|

A Time-Frequency Attention Module for Neural Speech Enhancement

Abstract: Speech enhancement plays an essential role in a wide range of speech processing applications. Recent studies on speech enhancement tend to investigate how to effectively capture the long-term contextual dependencies of speech signals to boost performance. However, these studies generally neglect the time-frequency (T-F) distribution information of speech spectral components, which is equally important for speech enhancement. In this paper, we propose a simple yet very effective network module, which we term th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 23 publications
(9 citation statements)
references
References 73 publications
0
4
0
Order By: Relevance
“…Through this selective information aggregation mechanism, the speech enhancement network can better preserve the desired speech characteristics and remove uncorrelated noise information more effectively. Currently, there are three popular ways to compute the attention vector in speech enhancement deep networks: channel attention [25], spatial attention [25], and time-frequency (T-F) attention [26]. By using different perspectives to discriminate the importance of different contextual spectral information, each way has its unique advantage in boosting network performance.…”
Section: Triple-attention-based Tcnn (Ta-tcnn)mentioning
confidence: 99%
See 1 more Smart Citation
“…Through this selective information aggregation mechanism, the speech enhancement network can better preserve the desired speech characteristics and remove uncorrelated noise information more effectively. Currently, there are three popular ways to compute the attention vector in speech enhancement deep networks: channel attention [25], spatial attention [25], and time-frequency (T-F) attention [26]. By using different perspectives to discriminate the importance of different contextual spectral information, each way has its unique advantage in boosting network performance.…”
Section: Triple-attention-based Tcnn (Ta-tcnn)mentioning
confidence: 99%
“…As mentioned earlier, in this paper, we also introduce the T-F attention presented in [26] and exploit it to characterize a salient energy distribution of speech in the time and frequency dimensions. As shown in the right part of Figure 4, the T-F attention block includes two parallel attention paths: time-dimension attention and frequency-dimension attention.…”
Section: Time-frequency (T-f) Attentionmentioning
confidence: 99%
“…Where QW Q h , QK K h , QV V h are the learnable parameters. The attention mechanism has gained recent attention and has been the subject of studies [46], [47]. These studies have shown that attention mechanisms can effectively model the distributions of speech signals across frequency and time dimensions.…”
Section: A Multi-head Self Attention Transformer With Time-frequency ...mentioning
confidence: 99%
“…Inspired by speech enhancement techniques (Zhang et al 2020(Zhang et al , 2022 that restore clear speech from noisy recordings, we aim to mitigate the adverse effects of lip occlusion on audio-visual speech recognition by restoring occluded lips. Considering the partially occluded lip shown in Fig.…”
Section: Introductionmentioning
confidence: 99%