2009
DOI: 10.1016/j.specom.2008.09.001
|View full text |Cite
|
Sign up to set email alerts
|

On the optimality of ideal binary time–frequency masks

Abstract: The concept of ideal binary time-frequency masks has received attention recently in monaural and binaural sound separation. Although often assumed, the optimality of ideal binary masks in terms of signal-to-noise ratio has not been rigorously addressed. In this paper we give a formal treatment on this issue and clarify the conditions for ideal binary masks to be optimal. We also experimentally compare the performance of ideal binary masks to that of ideal ratio masks on a speech mixture database and a music da… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

2
90
0
2

Year Published

2012
2012
2020
2020

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 133 publications
(94 citation statements)
references
References 19 publications
2
90
0
2
Order By: Relevance
“…In the simplest setting, the idbm techniques assume that a target speech signal has been contaminated by an additive noise source such that the noisy mixture signal is given by , where denotes a discrete-time index. The idbm techniques decompose the signals , , and , e.g., by applying a discrete Fourier transform (DFT) in successive time frames, e.g., [5] and [6], or using gamma-tone filter banks, e.g., [7], [8], resulting in time-frequency units, , and , respectively, where and are frequency and time indices. Generally speaking, the idbm techniques retain time-frequency units which are dominated by the target speech, and suppress time-frequency units which are dominated by the noise source.…”
Section: Introductionmentioning
confidence: 99%
“…In the simplest setting, the idbm techniques assume that a target speech signal has been contaminated by an additive noise source such that the noisy mixture signal is given by , where denotes a discrete-time index. The idbm techniques decompose the signals , , and , e.g., by applying a discrete Fourier transform (DFT) in successive time frames, e.g., [5] and [6], or using gamma-tone filter banks, e.g., [7], [8], resulting in time-frequency units, , and , respectively, where and are frequency and time indices. Generally speaking, the idbm techniques retain time-frequency units which are dominated by the target speech, and suppress time-frequency units which are dominated by the noise source.…”
Section: Introductionmentioning
confidence: 99%
“…For each time-frequency unit in the binary matrix, noise is attenuated if the energy of the noise exceeds the energy of the target speech (i.e., has a local SNR of 0 dB or below). If the local SNR is above 0 dB, the unit is retained in the binary matrix to optimize the SNR gain with the binary masks (Li & Wang, 2009). Designed to enhance speech intelligibility, the binary masking noise reduction scheme has been proven to show substantial improvement in speech recognition in a background with irrelevant speech (Brungart et al, 2006;Wang et al, 2009).…”
Section: Hearing Aids and Signal Processingmentioning
confidence: 99%
“…Fields of research investigating the effect of auditory interferers include: the perception of environmental noise [2,3], the perception of multiple talkers [4], source separation [5], and combinations of these [6]. These studies generally do not consider common domestic interferers, such as music or sound effects in films; and where they do, they either do not isolate the interferer effect or they include artifacts and degradations that may be specific to source separation algorithms.…”
Section: Introductionmentioning
confidence: 99%
“…A similar conceptual framework was utilized in [1]. While the relationship between the effect of the interferer and the effect of target quality degradations is unclear, a considerable body of research exists on these topics individually.Fields of research investigating the effect of auditory interferers include: the perception of environmental noise [2,3], the perception of multiple talkers [4], source separation [5], and combinations of these [6]. These studies generally do not consider common domestic interferers, such as music or sound effects in films; and where they do, they either do not isolate the interferer effect or they include artifacts and degradations that may be specific to source separation algorithms.…”
mentioning
confidence: 99%