2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2017
DOI: 10.1109/icassp.2017.7952238
|View full text |Cite
|
Sign up to set email alerts
|

Improving the perceptual quality of ideal binary masked speech

Abstract: It is known that applying a time-frequency binary mask to very noisy speech can improve its intelligibility but results in poor perceptual quality. In this paper we propose a new approach to applying a binary mask that combines the intelligibility gains of conventional binary masking with the perceptual quality gains of a classical speech enhancer. The binary mask is not applied directly as a time-frequency gain as in most previous studies. Instead, the mask is used to supply prior information to a classical s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
14
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(14 citation statements)
references
References 24 publications
0
14
0
Order By: Relevance
“…By defining the cost function on the log-spectral amplitude error, we determine a gain function in the form of a generalized binary mask, which enables improved speech intelligibility [16]. Recently the binary mask techniques have been proposed as a signal processing tool to study and analyze the time-frequency analysis and grouping process of the auditory system [17]- [20]. However, perhaps motivated by the significant intelligibility improvements achievable in this ideal setting, where local target-to-noise energy ratios are known with certainty, the binary mask framework has more recently been adapted to the practical problem of retrieving a target speech signal from a noisy mixture in the non-ideal situation where the local target-to-noise energy ratios are unknown, but must be estimated from the noisy mixture signal.…”
Section: Literature Surveymentioning
confidence: 99%
“…By defining the cost function on the log-spectral amplitude error, we determine a gain function in the form of a generalized binary mask, which enables improved speech intelligibility [16]. Recently the binary mask techniques have been proposed as a signal processing tool to study and analyze the time-frequency analysis and grouping process of the auditory system [17]- [20]. However, perhaps motivated by the significant intelligibility improvements achievable in this ideal setting, where local target-to-noise energy ratios are known with certainty, the binary mask framework has more recently been adapted to the practical problem of retrieving a target speech signal from a noisy mixture in the non-ideal situation where the local target-to-noise energy ratios are unknown, but must be estimated from the noisy mixture signal.…”
Section: Literature Surveymentioning
confidence: 99%
“…Unfortunately, the direct application of a binary mask leads to poor perceived quality [11]. In this work, we (i) use a deep neural network (DNN) to estimate a binary mask that captures the modulations of the target source and (ii) use the mask to define the speech presence probability in a conventional speech enhancer [12,13].…”
Section: Introductionmentioning
confidence: 99%
“…Besides, mask-based SE methods have predominantly been applied in many SE and speech separation applications [7]. The key idea behind mask-based SE methods is to estimate a spectrographic binary or soft mask to suppress the unwanted spectrogram components [7][8][9][10][11]. For binary mask-based SE methods, the spectrographic masks are "hard binary masks" where a spectral component is either set to 1 for the target speech component or set to 0 for the non-target speech component.…”
Section: Introductionmentioning
confidence: 99%
“…For binary mask-based SE methods, the spectrographic masks are "hard binary masks" where a spectral component is either set to 1 for the target speech component or set to 0 for the non-target speech component. Experimental results have shown that the performance of binary mask SE methods degrades with the decrease of the signal-to-noise ratio (SNR) and the masked spectral may cause the loss of speech components due to the harsh black or white binary conditions [7,8]. To overcome this disadvantage, the soft mask-based SE methods have been developed [8].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation