Cochlea-scaled entropy, not consonants, vowels, or time, best predicts speech intelligibility

Stilp, Christian E.; Kluender, Keith R.

doi:10.1073/pnas.0913625107

Cited by 98 publications

(92 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Such measures typically place equal emphasis on all segments processed, paying the same attention to transitional segments marked with significant spectral change (e.g., vowel-consonant boundaries) and to steadystate (or quasi steady-state) segments (e.g., vowel centers). This is, however, contrary to existing speech perception literature pointing to differences in the contributions of vowels vs consonants (e.g., Kewley-Port et al, 2007) and differences between low and high-entropy segments (Stilp and Kluender, 2010) on speech recognition. If vowels do indeed carry more information than consonants, that would suggest the development of intelligibility measures that place more emphasis on the vocalic segments rather than the consonant segments.…”

Section: Introductioncontrasting

confidence: 54%

“…The underlying hypothesis is that including only these information-bearing segments in the computation of intelligibility indices ought to improve the correlation with human listener's intelligibility scores relative to the scenario where all segments are included. Unlike previous studies (e.g., Kewley-Port et al, 2007;Stilp and Kluender, 2010) that replaced the segments of interest with equal-level noise and assessed their importance with listening experiments, the present study evaluates indirectly the perceptual importance of these segments in the context of intelligibility measures with the main goal of improving the prediction power of existing intelligibility measures. Clearly, the method used for segmenting sentences (whether phonetically or not) into different units will affect the predictive power of the intelligibility index.…”

Section: Introductionmentioning

confidence: 96%

“…A remarkably robust correlation was found with cochlea-scaled entropy predicting listeners' intelligibility scores. Stilp and Kluender (2010) also reported that the duration of the signal replaced and proportion of consonants/vowels replaced were not significant predictors of intelligibility, and thus did not account for the strong relationship between CSE and sentence intelligibility.…”

Section: Introductionmentioning

confidence: 99%

“…To compute the cochlea-scaled spectral entropy (Stilp and Kluender, 2010), the sentence is first normalized according to its RMS intensity, and then divided into 16 ms segments. Segments are first bandpass filtered into M bands using ro-ex filters (Patterson et al, 1982).…”

Section: A Scaled-entropy Based Segmentationmentioning

confidence: 99%

“…Grounded on the well-known fact that perceptual systems respond primarily to change, Stilp and Kluender (2010) recently suggested that cochlea-scaled entropy (CSE), not vowels, consonants or segment duration, best predicts speech intelligibility. They measured cochlea-scaled entropy in TIMIT sentences and replaced portions of the sentences having high, medium, or low entropy with equal-level noise.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Contributions of cochlea-scaled entropy and consonant-vowel boundaries to prediction of speech intelligibility in noise

Chen

Loizou

2012

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

Recent evidence suggests that spectral change, as measured by cochlea-scaled entropy (CSE), predicts speech intelligibility better than the information carried by vowels or consonants in sentences. Motivated by this finding, the present study investigates whether intelligibility indices implemented to include segments marked with significant spectral change better predict speech intelligibility in noise than measures that include all phonetic segments paying no attention to vowels/consonants or spectral change. The prediction of two intelligibility measures [normalized covariance measure (NCM), coherence-based speech intelligibility index (CSII)] is investigated using three sentencesegmentation methods: relative root-mean-square (RMS) levels, CSE, and traditional phonetic segmentation of obstruents and sonorants. While the CSE method makes no distinction between spectral changes occurring within vowels/consonants, the RMS-level segmentation method places more emphasis on the vowel-consonant boundaries wherein the spectral change is often most prominent, and perhaps most robust, in the presence of noise. Higher correlation with intelligibility scores was obtained when including sentence segments containing a large number of consonant-vowel boundaries than when including segments with highest entropy or segments based on obstruent/sonorant classification. These data suggest that in the context of intelligibility measures the type of spectral change captured by the measure is important.

show abstract

Section: Introductioncontrasting

confidence: 54%

Section: Introductionmentioning

confidence: 96%

Section: Introductionmentioning

confidence: 99%

Section: A Scaled-entropy Based Segmentationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Contributions of cochlea-scaled entropy and consonant-vowel boundaries to prediction of speech intelligibility in noise

Chen

Loizou

2012

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

show abstract

Measures of Speech Perception

Wouters,

Gransier,

van Wieringen

2024

The Handbook of Clinical Linguistics, Second Edition

View full text Add to dashboard Cite

Shared neural and cognitive mechanisms in action and language: The multiscale information transfer framework

Blumenthal-Dramé

Malaia

2018

WIRES Cognitive Science

View full text Add to dashboard Cite

This review compares how humans process action and language sequences produced by other humans. On the one hand, we identify commonalities between action and language processing in terms of cognitive mechanisms (e.g., perceptual segmentation, predictive processing, integration across multiple temporal scales), neural resources (e.g., the left inferior frontal cortex), and processing algorithms (e.g., comprehension based on changes in signal entropy). On the other hand, drawing on sign language with its particularly strong motor component, we also highlight what differentiates (both oral and signed) linguistic communication from nonlinguistic action sequences. We propose the multiscale information transfer framework (MSIT) as a way of integrating these insights and highlight directions into which future empirical research inspired by the MSIT framework might fruitfully evolve. This article is categorized under: Psychology > Language Linguistics > Language in Mind and Brain Psychology > Motor Skill and Performance Psychology > Prediction

show abstract

Cochlea-scaled entropy, not consonants, vowels, or time, best predicts speech intelligibility

Cited by 98 publications

References 40 publications

Contributions of cochlea-scaled entropy and consonant-vowel boundaries to prediction of speech intelligibility in noise

Contributions of cochlea-scaled entropy and consonant-vowel boundaries to prediction of speech intelligibility in noise

Measures of Speech Perception

Shared neural and cognitive mechanisms in action and language: The multiscale information transfer framework

Contact Info

Product

Resources

About