2009
DOI: 10.1121/1.3179673
|View full text |Cite
|
Sign up to set email alerts
|

Role of mask pattern in intelligibility of ideal binary-masked noisy speech

Abstract: Intelligibility of ideal binary masked noisy speech was measured on a group of normal hearing individuals across mixture signal to noise ratio (SNR) levels, masker types, and local criteria for forming the binary mask. The binary mask is computed from time-frequency decompositions of target and masker signals using two different schemes: an ideal binary mask computed by thresholding the local SNR within time-frequency units and a target binary mask computed by comparing the local target energy against the long… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

13
187
1

Year Published

2013
2013
2018
2018

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 152 publications
(203 citation statements)
references
References 24 publications
(20 reference statements)
13
187
1
Order By: Relevance
“…Our method of applying an SNR-dependent binary mask to the target speech resembles the technique of ideal time-frequency segregation (ITFS) that is known from computational auditory scene analysis studies (e.g., Wang 2005;Brungart 2006;Kjems et al 2009). Although these studies used diotic signals and applied Fig.…”
Section: Discussionmentioning
confidence: 99%
“…Our method of applying an SNR-dependent binary mask to the target speech resembles the technique of ideal time-frequency segregation (ITFS) that is known from computational auditory scene analysis studies (e.g., Wang 2005;Brungart 2006;Kjems et al 2009). Although these studies used diotic signals and applied Fig.…”
Section: Discussionmentioning
confidence: 99%
“…The choices made here, with LC about 5 dB smaller than input SNR, were motivated by values shown to be effective for noisy sentences (Brungart et al, 2006;Li and Loizou, 2008;Wang et al, 2009;Kjems et al, 2009). It is possible that LC values that are most effective for consonant materials will differ from those for sentence materials, perhaps due to the increased requirements for acoustic speech information and increased reliance on bottom-up processing.…”
Section: Assessing Benefitmentioning
confidence: 99%
“…The implementation is based on the observation that the structure and shape of the binary mask patterns is important for both human [15] and machine recognition of speech [26], and that there are similarities between the binary patterns corresponding to a phonetic unit [27]. Our goal is to encode the prior information about the structure of the binary mask corresponding to a BPU in a simple averaged model that can then be used to refine a bottom-up mask estimated using a conventional IBM estimation algorithm.…”
Section: Implementation Using Average Mask Priorsmentioning
confidence: 99%
“…An element of this vector represents the probability of the frequency channel being speech dominant given the phonetic identity of the time-frame. Since we want the AMPs to be independent of a specific noise condition, they are formed based on the target binary mask (TBM) [15] as opposed to the ideal binary mask. The TBM is defined similar to Eq.…”
Section: Implementation Using Average Mask Priorsmentioning
confidence: 99%
See 1 more Smart Citation