2012
DOI: 10.1109/tasl.2011.2157685
|View full text |Cite
|
Sign up to set email alerts
|

Spectral Magnitude Minimum Mean-Square Error Estimation Using Binary and Continuous Gain Functions

Abstract: Abstract-Recently, binary mask techniques have been proposed as a tool for retrieving a target speech signal from a noisy observation. A binary gain function is applied to time-frequency tiles of the noisy observation in order to suppress noise dominated and retain target dominated time-frequency regions. When implemented using discrete Fourier transform (DFT) techniques, the binary mask techniques can be seen as a special case of the broader class of DFT-based speech enhancement algorithms, for which the appl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
24
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 41 publications
(25 citation statements)
references
References 27 publications
1
24
0
Order By: Relevance
“…We further assume that the speech and noise are statistically independent. These signals are windowed and transformed into the frequency domain by applying the short-time discrete Fourier transform (DFT) leading to (1) where , and denote the noisy speech, target speech and noise DFT coefficient, respectively, at frequency-bin index , time-frame index and microphone . We assume the DFT coefficients to be independent in time and frequency, which allows us omit the time and frequency indices for brevity.…”
Section: Problem Formulationmentioning
confidence: 99%
See 1 more Smart Citation
“…We further assume that the speech and noise are statistically independent. These signals are windowed and transformed into the frequency domain by applying the short-time discrete Fourier transform (DFT) leading to (1) where , and denote the noisy speech, target speech and noise DFT coefficient, respectively, at frequency-bin index , time-frame index and microphone . We assume the DFT coefficients to be independent in time and frequency, which allows us omit the time and frequency indices for brevity.…”
Section: Problem Formulationmentioning
confidence: 99%
“…Speech enhancement algorithms can be categorized into two classes: single-channel and multi-channel techniques. Although single-channel algorithms can improve quality and have been shown to be able to improve speech intelligibility to some extent [1], improvements are generally modest as they can utilize only the spectral information [2]- [4]. Multi-channel speech enhancement algorithms have in theory the potential to improve the speech quality and intelligibility by using both spectral and spatial information about the speech and the noise sources [5], [6].…”
Section: Introductionmentioning
confidence: 99%
“…The IBM in the DFT domain is defined using (4), by comparing the energy (squared-magnitude) of clean speech and noise at each T-F unit. When estimating the binary mask, a recently proposed MMSE-based mask estimator is used [16]. We use Type-II binary masks as defined in [16] that minimize the spectral 'squared-magnitude' MSE.…”
Section: E Comparison Of Fft and Gammatone Filterbank Based Represenmentioning
confidence: 99%
“…When estimating the binary mask, a recently proposed MMSE-based mask estimator is used [16]. We use Type-II binary masks as defined in [16] that minimize the spectral 'squared-magnitude' MSE. It has the same form as the spectral magnitude MMSE mask derived in [16], except that the spectral squared-magnitude MMSE gain function is used in place of the gain function used in [16].…”
Section: E Comparison Of Fft and Gammatone Filterbank Based Represenmentioning
confidence: 99%
“…Meanwhile, the algorithm derived and applied binary gain for the enhancement. The binary gain produces enhanced speech with low sound quality when compared to that produced by continuous (or soft) gain [10,11].…”
Section: Introductionmentioning
confidence: 99%