2021
DOI: 10.1121/10.0006285
|View full text |Cite
|
Sign up to set email alerts
|

Speech intelligibility and talker gender classification with noise-vocoded and tone-vocoded speech

Abstract: Vocoded speech provides less spectral information than natural, unprocessed speech, negatively affecting listener performance on speech intelligibility and talker gender classification tasks. In this study, young normal-hearing participants listened to noise-vocoded and tone-vocoded (i.e., sinewave-vocoded) sentences containing 1, 2, 4, 8, 16, or 32 channels, as well as non-vocoded sentences, and reported the words heard as well as the gender of the talker. Overall, performance was significantly better with to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 27 publications
0
2
0
Order By: Relevance
“…No acoustic interferers were used in this experiment, but in terms of the framework for adverse listening conditions proposed by Mattys et al (2009), the perceptual load associated with high-resolution vocoding should have an EM component (here, signal degradation) associated with any loss of target intelligibility and an IM component (cognitive load) associated with the extent of change in stimulus naturalness. The EM component should increase the reliance on acoustic detail but should be small given that-at least for word identification in sentences-there is no intelligibility cost for MNB and MSB vocoding relative to natural speech when the number of channels is !16 (Villard and Kidd, 2021). The IM component should act to increase the reliance on lexical information and was predicted to be considerably larger for the MSB than for the MNB stimuli.…”
Section: Methodsmentioning
confidence: 97%
“…No acoustic interferers were used in this experiment, but in terms of the framework for adverse listening conditions proposed by Mattys et al (2009), the perceptual load associated with high-resolution vocoding should have an EM component (here, signal degradation) associated with any loss of target intelligibility and an IM component (cognitive load) associated with the extent of change in stimulus naturalness. The EM component should increase the reliance on acoustic detail but should be small given that-at least for word identification in sentences-there is no intelligibility cost for MNB and MSB vocoding relative to natural speech when the number of channels is !16 (Villard and Kidd, 2021). The IM component should act to increase the reliance on lexical information and was predicted to be considerably larger for the MSB than for the MNB stimuli.…”
Section: Methodsmentioning
confidence: 97%
“…The rationale for using this type of NVS was that potential context-driven auditory learning effects are, generally, maximal when the auditory signal is poor, but still contains sufficient information to allow for context-driven perceptual restoration. Prior research has indicated that NVS word identification accuracy at least doubles when moving up from 2 to 4 channels, but then increases with about half that effect size when moving from 4 to 8 channels (where accuracy tapered-off and was comparable to 16 and 32 channel NVS, [ 36 ]). Moreover, two-year-olds start to show the first signs of word recognition with 4-channel NVS (when compared to 2-channel NVS, [ 37 ]), and Senan et al [ 38 ] showed that dual task interference from 4-channel NVS was in between, but statistically comparable to 2-channel and 6-channel NVS (the primary task was a digit-recall task), whereas interference from 4 and 6-channel NVS was also statistically comparable to natural speech.…”
Section: Methodsmentioning
confidence: 99%