2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2018
DOI: 10.23919/apsipa.2018.8659692
|View full text |Cite
|
Sign up to set email alerts
|

Time-Frequency Mask-based Speech Enhancement using Convolutional Generative Adversarial Network

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
21
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 35 publications
(22 citation statements)
references
References 24 publications
0
21
0
Order By: Relevance
“…The phase of noisy speech is utilized along with the enhanced speech magnitude to reconstruct the time-domain signal with inverse short-time Fourier transform (iSTFT). Despite some inspiring results achieved [1][2][3][4], there are still two main limitations in T-F domain methods. First, Fourier transform operation is an additional overhead which hinders fast speech denoising.…”
Section: Introductionmentioning
confidence: 99%
“…The phase of noisy speech is utilized along with the enhanced speech magnitude to reconstruct the time-domain signal with inverse short-time Fourier transform (iSTFT). Despite some inspiring results achieved [1][2][3][4], there are still two main limitations in T-F domain methods. First, Fourier transform operation is an additional overhead which hinders fast speech denoising.…”
Section: Introductionmentioning
confidence: 99%
“…Assessing the quality of audio signals is an important consideration in many audio and multimedia applications, such as speech recognition, high-quality music recording, and machine fault detection. By now, there have been some studies in audio quality assessment [1,2,3,4]. Fu et al [2] developed a non-intrusive speech quality evaluation model to predict PESQ scores using a BLSTM model.…”
Section: Introductionmentioning
confidence: 99%
“…Fu et al [2] developed a non-intrusive speech quality evaluation model to predict PESQ scores using a BLSTM model. Avila et al, [3] investigated the applicability of three neural network-based approaches for non-intrusive audio quality assessment based on mean opinion score(MOS) estimation [5]. These previous studies mainly focused on quality estimations of audio recordings by predicting a quality score.…”
Section: Introductionmentioning
confidence: 99%
“…Many existing DNN-based speech denoising algorithms are implemented in the time-frequency (T-F) domain, where only the spectral magnitude is enhanced and the noisy phase is used for speech reconstruction. Some of those works such as [1]- [4], have achieved inspiring results. However, there are two main limitations in the T-F domain methods.…”
Section: Introductionmentioning
confidence: 99%