For optimal word recognition listeners should use all relevant acoustic information as soon as it comes available. Using printed-word eye tracking we investigated when during word processing Dutch listeners use suprasegmental lexical stress information to recognize words. Fixations on targets such as "OCtopus" (capitals indicate stress) were more frequent than fixations on segmentally overlapping but differently stressed competitors ("okTOber") before segmental information could disambiguate the words. Furthermore, prior to segmental disambiguation, initially stressed words were stronger lexical competitors than noninitially stressed words. Listeners recognize words by immediately using all relevant information in the speech signal.
Listeners use lexical knowledge to adjust to speakers’ idiosyncratic pronunciations. Dutch listeners learn to interpret an ambiguous sound between /s/ and /f/ as /f/ if they hear it word-finally in Dutch words normally ending in /f/, but as /s/ if they hear it in normally /s/-final words. Here, we examined two positional effects in lexically guided retuning. In Experiment 1, ambiguous sounds during exposure always appeared in word-initial position (replacing the first sounds of /f/- or /s/-initial words). No retuning was found. In Experiment 2, the same ambiguous sounds always appeared word-finally during exposure. Here, retuning was found. Lexically guided perceptual learning thus appears to emerge reliably only when lexical knowledge is available as the to-be-tuned segment is initially being processed. Under these conditions, however, lexically guided retuning was position independent: It generalized across syllabic positions. Lexical retuning can thus benefit future recognition of particular sounds wherever they appear in words
A series of eye-tracking and categorization experiments investigated the use of speaking-rate information in the segmentation of Dutch ambiguous-word sequences. Juncture phonemes with ambiguous durations (e.g., [s] in 'eens (s)peer,' "once (s)pear," [t] in 'nooit (t)rap,' "never staircase/quick") were perceived as longer and hence more often as word-initial when following a fast than a slow context sentence. Listeners used speaking-rate information as soon as it became available. Rate information from a context proximal to the juncture phoneme and from a more distal context was used during on-line word recognition, as reflected in listeners' eye movements. Stronger effects of distal context, however, were observed in the categorization task, which measures the off-line results of the word-recognition process. In categorization, the amount of rate context had the greatest influence on the use of rate information, but in eye tracking, the rate information's proximal location was the most important. These findings constrain accounts of how speaking rate modulates the interpretation of durational cues during word recognition by suggesting that rate estimates are used to evaluate upcoming phonetic information continuously during prelexical speech processing.
a b s t r a c tThe strongest support for feedback in speech perception comes from evidence of apparent lexical influence on prelexical fricative-stop compensation for coarticulation. Lexical knowledge (e.g., that the ambiguous final fricative of Christma? should be [s]) apparently influences perception of following stops. We argue that all such previous demonstrations can be explained without invoking lexical feedback. In particular, we show that one demonstration [Magnuson, J. S., McMurray, B., Tanenhaus, M. K., & Aslin, R. N. (2003). Lexical effects on compensation for coarticulation: The ghost of Christmash past. Cognitive Science, 27, 285-298] involved experimentally-induced biases (from 16 practice trials) rather than feedback. We found that the direction of the compensation effect depended on whether practice stimuli were words or nonwords. When both were used, there was no lexicallymediated compensation. Across experiments, however, there were lexical effects on fricative identification. This dissociation (lexical involvement in the fricative decisions but not in the following stop decisions made on the same trials) challenges interactive models in which feedback should cause both effects. We conclude that the prelexical level is sensitive to experimentally-induced phoneme-sequence biases, but that there is no feedback during speech perception.
In the present study, we examined the distribution and processing of information over time in auditory and visual speech as it is used in unimodal and bimodal word recognition. English consonant-vowel-consonant words representing all possible initial consonants were presented as auditory, visual, or audiovisual speech in a gating task. The distribution of information over time varied across and within features. Visual speech information was generally fully available early during the phoneme, whereas auditory information was still accumulated. An audiovisual benefit was therefore already found early during the phoneme. The nature of the audiovisual recognition benefit changed, however, as more of the phoneme was presented. More features benefited at short gates rather than at longer ones. Visual speech information plays, therefore, a more important role early during the phoneme rather than later. The results of the study showed the complex interplay of information across modalities and time, since this is essential in determining the time course of audiovisual spoken-word recognition.
The influence of lexical characteristics of words in to-be-attended and to-be-ignored speech streams was examined in a competing speech task. Older, middle-aged, and younger adults heard pairs of low-cloze probability sentences in which the frequency or neighborhood density of words was manipulated in either the target speech stream or the masking speech stream. All participants also completed a battery of cognitive measures. As expected, for all groups, target words that occur frequently or that are from sparse lexical neighborhoods were easier to recognize than words that are infrequent or from dense neighborhoods. Compared to other groups, these neighborhood density effects were largest for older adults; the frequency effect was largest for middle-aged adults. Lexical characteristics of words in the to-be-ignored speech stream also affected recognition of tobe-attended words, but only when overall performance was relatively good (that is, when younger participants listened to the speech streams at a more advantageous signal-to-noise ratio). For these listeners, to-be-ignored masker words from sparse neighborhoods interfered with recognition of target speech more than masker words from dense neighborhoods. Amount of hearing loss and cognitive abilities relating to attentional control modulated overall performance as well as the strength of lexical influences.
Older listeners are more affected than younger listeners in their recognition of speech in adverse conditions, such as when they also hear a single-competing speaker. In the present study, we investigated with a speeded response task whether older listeners with various degrees of hearing loss benefit under such conditions from also seeing the speaker they intend to listen to. We also tested, at the same time, whether older adults need postperceptual processing to obtain an audiovisual benefit. When tested in a phoneme-monitoring task with single-talker noise present, older (and younger) listeners detected target phonemes more reliably and more rapidly in meaningful sentences uttered by the target speaker when they also saw the target speaker. This suggests that older adults processed audiovisual speech rapidly and efficiently enough to benefit already during spoken sentence processing. Audiovisual benefits for older adults were similar in size to those observed for younger adults in terms of response latencies, but smaller for detection accuracy. Older adults with more hearing loss showed larger audiovisual benefits. Attentional abilities predicted the size of audiovisual response time benefits in both age groups. Audiovisual benefits were found in both age groups when monitoring for the visually highly distinct phoneme /p/ and when monitoring for the visually less distinct phoneme /k/. Visual speech thus provides segmental information about the target phoneme, but also provides more global contextual information that helps both older and younger adults in this adverse listening situation
Many older listeners report difficulties in understanding speech in noisy situations. Working memory and other cognitive skills may modulate older listeners' ability to use context information to alleviate the effects of noise on spoken-word recognition. In the present study, we investigated whether verbal working memory predicts older adults' ability to immediately use context information in the recognition of words embedded in sentences, presented in different listening conditions. In a phoneme-monitoring task, older adults were asked to detect as fast and as accurately as possible target phonemes in sentences spoken by a target speaker. Target speech was presented without noise, with fluctuating speech-shaped noise, or with competing speech from a single distractor speaker. The gradient measure of contextual probability (derived from a separate offline rating study) affected the speed of recognition. Contextual facilitation was modulated by older listeners' verbal working memory (measured with a backward digit span task) and age across listening conditions. Working memory and age, as well as hearing loss, were also the most consistent predictors of overall listening performance. Older listeners' immediate benefit from context in spoken-word recognition thus relates to their ability to keep and update a semantic representation of the sentence content in working memory.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.