The slow time-intensity modulation of the speech envelope, defined as fluctuations in the overall amplitude at rates between about 2 Hz and 50 Hz, can convey important linguistic information, manner of articulation, the presence of voicing, and some prosodic information [1]. The importance of these temporal envelope cues to speech perception has been demonstrated because it is believed that the cues can be treated as the only information available to people with severe or profound sensorineural hearing loss and for cochlear implant (CI) users [2][3][4][5]. If listeners with severe or profound hearing loss can only utilize limited fine spectral and temporal information [6], temporal envelope Purpose: The goal of the present study was to investigate the effect of temporal envelope cues on consonant confusions.Methods: The temporal envelope was extracted from each of 16 consonant-vowel (CV) sounds using the psychoacoustic-based 26 critical auditory bands. Temporal smearing of these processed signals was produced by applying low-pass filters (LPF) with one of five cutoff frequencies. Confusion matrices were measured in normal hearing listeners as a function of a signal-to-noise ratio (SNR).
Results:The results showed that temporal envelope information processed by the critical auditory bands provides much poorer consonant cues, compared to ones processed with wider and fewer numbers of auditory bands. The error rate for consonant perception decreased with the increase in temporal modulation across SNR, with higher weight on the SNR than on the LPF. The results also showed the three sound groupings: four CVs were the most difficult sounds, seven CVs were the easiest sounds, and five CVs were influenced the most by LPF cutoffs. The confusion patterns were similar between the unprocessed CVs and the temporally-processed CVs. Duration contributed the most while affrication contributed the lowest for consonant perception.Conclusions: Consonant perception is largely influenced when the LPF cutoff is lower than 8 Hz. Confusion patterns are similar between the natural consonants and the temporallyprocessed consonants, even though the overall error rate is higher with the temporal envelope cues. The results of the current study could provide control data for the many cochlear implant studies that used acoustic simulations with a vocoder.