Improved Speech Enhancement Using a Time-Domain GAN with Mask Learning

Lin, Ju; Niu, Sufeng; Wijngaarden, A.J. van; McClendon, Jerome; Smith, Melinda; Wang, Kuang-Ching

doi:10.21437/interspeech.2020-1946

Cited by 14 publications

(5 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Mapping-based and masking-based methods are another categorization of speech enhancement approaches which are used in [1], [2], [3], [4] and [5], [6], [7] respectively. In the mapping-based approach, the algorithm attempts to figure out how to connect the clean target audio and the noisy input.…”

Section: Speech Enhancement Methods Using Ganmentioning

confidence: 99%

Review of Speech Enhancement Methods using Generative Adversarial Networks

Skariah,

Thomas

2023

2023 International Conference on Control, Communication and Computing (ICCC)

View full text Add to dashboard Cite

The goal of speech enhancement is to restore the quality of noise affected speech by removing the noise. The purpose of speech enhancement is to increase the damaged speech signal's comprehensibility and overall perceptual quality. Given the nature of the noise, which is very non-stationary, speech enhancement is a difficult challenge. In recent years, speech enhancement has been accomplished using Generative Adversarial Networks (GANs). This review includes GAN based speech enhancement algorithms, data sets and evaluation metrics. We also discuss the issues and compare the performance of the different speech enhancement methods.

show abstract

Section: Speech Enhancement Methods Using Ganmentioning

confidence: 99%

Review of Speech Enhancement Methods using Generative Adversarial Networks

Skariah,

Thomas

2023

2023 International Conference on Control, Communication and Computing (ICCC)

View full text Add to dashboard Cite

show abstract

“…These speech and noise spectrograms are used to compute the speech mark loss. TIMIT dataset is used and the method outperforms DNN based speech enhancement and SEGAN [14].…”

Section: Time Domain Gan With Mask Learningmentioning

confidence: 99%

Study of Generative Adversarial Networks for Acoustic Signal Enhancement: A Review

2022

IJETER

View full text Add to dashboard Cite

Acoustic signals enhancement is an important research topic. It has many applications like cochlear implants, speech and speaker recognition, hearing aids, mobile phones etc. The signals processed by these system are always susceptible to noises. Hence, algorithms are required to extract clean signal from noisy ones. Nowadays , deep neural network are the most sought after tool for signal enhancement. Generative Adversarial Network(GAN) is also one of the recent approaches applied to signal enhancement domain. More work is performed by GANs in image and video processing. To the best of my knowledge no review work on the usage of GANs for acoustic signal enhancement have been done. This paper is a review on the use of GANs for acoustical signals enhancement where speech signal is used as acoustic signal. The paper provides in a summarized manner about the basic GAN architectures and its limitations, feature sets used as input to GAN, limitations, performance evaluation measures and future directions.

show abstract

“…Although these methods address the false-extraction problem to some extent, they do not focus on distinguish whether the target speaker is present or absent, resulting in suboptimal performance. Some methods introduce additional information to verify the speaker's presence, such as speaker activity information [13,14] or visual cue [7], resulting in limited application of these methods.…”

Section: Introductionmentioning

confidence: 99%

Gated Cross-Attention for Universal Speaker Extraction: Pay attention to the Speaker’s Presence

Zhang¹,

Li²,

Liu³

et al. 2023

Preprint

View full text Add to dashboard Cite

<p>Current speaker extraction models have achieved good performance in extracting target speech from highly overlapped multi-talker speech. But in real-world applications, the multi-talker speech is sparsely overlapped and the target speaker may be absent from the speech mixture, making it difficult for the model to extract desired speech in this situation. The universal speaker extraction is proposed to solve the problem by evaluating the quality of estimated speech signals and silence. However, the design of existing universal speaker extraction models does not take into account distinguishing the presence or absence of the target speaker. In this paper, we propose a gated cross-attention network for universal speaker extraction. In our model, the cross-attention mechanism learns the correlation between the target speaker and the speech to distinguish whether the target speaker presents or not. According to the correlation, the gate mechanism makes the model focus on extracting speech when the target is present, while filtering out the features when the target is absent. Meanwhile, we propose a joint loss function to optimize the network in both target present and absent scenarios. We conducted experiments on the LibriMix dataset with various scenarios and evaluated the performance in terms of speech quality and speaker extraction error rate. The experiment results show that our proposed method outperforms the baselines in all of the scenarios.</p>

show abstract

Improved Speech Enhancement Using a Time-Domain GAN with Mask Learning

Cited by 14 publications

References 16 publications

Review of Speech Enhancement Methods using Generative Adversarial Networks

Review of Speech Enhancement Methods using Generative Adversarial Networks

Study of Generative Adversarial Networks for Acoustic Signal Enhancement: A Review

Gated Cross-Attention for Universal Speaker Extraction: Pay attention to the Speaker’s Presence

Contact Info

Product

Resources

About