Participants' eye movements were monitored as they heard sentences and saw four pictured objects on a computer screen. Participants were instructed to click on the object mentioned in the sentence. There were more transitory fixations to pictures representing monosyllabic words (e.g. ham) when the first syllable of the target word (e.g. hamster) had been replaced by a recording of the monosyllabic word than when it came from a different recording of the target word. This demonstrates that a phonemically identical sequence can contain cues that modulate its lexical interpretation. This effect was governed by the duration of the sequence, rather than by its origin (i.e. which type of word it came from). The longer the sequence, the more monosyllabic-word interpretations it generated. We argue that cues to lexical-embedding disambiguation, such as segmental lengthening, result from the realization of a prosodic boundary that often but not always follows monosyllabic words, and that lexical candidates whose word boundaries are aligned with prosodic boundaries are favored in the word-recognition process. q
Two experiments evaluated the time course and use of orthographic information in spoken-word recognition in a visual world eye-tracking experiment using printed words as referents. Participants saw four words on a computer screen and listened to spoken sentences instructing them to click on one of the words (e.g., Click on the word bead). The printed words appeared 200 ms before the onset of the spoken target word. In Experiment 1, the display included the target word and a competitor with either a lower degree of phonological overlap with the target (bear) or a higher degree of phonological overlap with the target (bean). Both competitors had the same degree of orthographic overlap with the target. There were more fixations to the competitors than to unrelated distracters. Crucially, the likelihood of fixating a competitor did not vary as a function of the amount of phonological overlap between target and competitor. In Experiment 2, the display included the target word and a competitor with either a lower degree of orthographic overlap with the target (bare) or a higher degree of orthographic overlap with the target (bear). Competitors were homophonous and thus had the same degree of phonological overlap with the target. There were more fixations to higheroverlap competitors than to lower-overlap competitors, beginning during the temporal interval where initial fixations driven by the vowel are expected to occur. The authors conclude that orthographic information is rapidly activated as a spoken word unfolds and is immediately used in mapping spoken words onto potential printed referents. Keywordsspoken-word recognition; orthography; phonology; visual-world paradigm; eye movements One of the more surprising findings in research on spoken-word recognition is that orthographic representations become available upon hearing a spoken word (see Frost & Ziegler, 2007, for a review). An early study by Seidenberg and Tanenhaus (1979) found that rhyme judgments for pairs of spoken words were delayed for orthographically dissimilar words (e.g., tie…rye) compared to orthographically similar words (e.g., tie…lie). Orthographic information is not relevant for making rhyme judgments, and therefore, a priori, one would not expect to find evidence for the activation of orthographic information in this task. Initially, there were concerns that this result might be due to strategic processing of the stimuli (but cf. Donnenwerth-Nolan, Tanenhaus, & Seidenberg, 1981;Tanenhaus, Flanigan, & Seidenberg, 1980). However, later studies have employed paradigms using less explicit response measures. For instance, Ziegler and Ferrand (1998) demonstrated that in an auditory lexical decision task, participants make slower responses to words that are orthographically inconsistent (i.e., words whose rhyme can be spelled in multiple ways, e.g. beak), than to words that are orthographically consistent (i.e., words whose rhyme can be spelled in only one way, e.g. luck). Orthographic effects have been found in a variety of other tasks, suc...
Two visual-world experiments examined listeners’ use of pre word-onset anticipatory coarticulation in spoken-word recognition. Experiment 1 established the shortest lag with which information in the speech signal influences eye-movement control, using stimuli such as “The … ladder is the target”. With a neutral token of the definite article preceding the target word, saccades to the referent were not more likely than saccades to an unrelated distractor until 200–240 ms after the onset of the target word. In Experiment 2, utterances contained definite articles which contained natural anticipatory coarticulation pertaining to the onset of the target word (“ The ladder … is the target”). A simple Gaussian classifier was able to predict the initial sound of the upcoming target word from formant information from the first few pitch periods of the article’s vowel. With these stimuli, effects of speech on eye-movement control began about 70 ms earlier than in Experiment 1, suggesting rapid use of anticipatory coarticulation. The results are interpreted as support for “data explanation” approaches to spoken-word recognition. Methodological implications for visual-world studies are also discussed.
Previous work examining prosodic cues in online spoken-word recognition has focused primarily on local cues to word identity. However, recent studies have suggested that utterance-level prosodic patterns can also influence the interpretation of subsequent sequences of lexically ambiguous syllables (Dilley, Mattys, & Vinke, Journal of Memory and Language, 63:274–294, 2010; Dilley & McAuley, Journal of Memory and Language, 59:294–311, 2008). To test the hypothesis that these distal prosody effects are based on expectations about the organization of upcoming material, we conducted a visual-world experiment. We examined fixations to competing alternatives such as pan and panda upon hearing the target word panda in utterances in which the acoustic properties of the preceding sentence material had been manipulated. The proportions of fixations to the monosyllabic competitor were higher beginning 200 ms after target word onset when the preceding prosody supported a prosodic constituent boundary following pan-, rather than following panda. These findings support the hypothesis that expectations based on perceived prosodic patterns in the distal context influence lexical segmentation and word recognition.
There is an emerging literature on visual search in natural tasks suggesting that task-relevant goals account for a remarkably high proportion of saccades, including anticipatory eye-movements. Moreover, factors such as “visual saliency” that otherwise affect fixations become less important when they are bound to objects that are not relevant to the task at hand. We briefly review this literature and discuss the implications for task-based variants of the visual world paradigm. We argue that the results and their likely interpretation may profoundly affect the “linking hypothesis” between language processing and the location and timing of fixations in task-based visual world studies. We outline a goal-based linking hypothesis and discuss some of the implications for how we conduct visual world studies, including how we interpret and analyze the data. Finally, we outline some avenues of research, including examples of some classes of experiments that might prove fruitful for evaluating the effects of goals in visual world experiments and the viability of a goal-based linking hypothesis.
Eye movements were monitored as participants followed spoken instructions to manipulate one of four objects pictured on a computer screen. Target words occurred in utterance-medial (e.g., Put the cap next to the square) or utterance-Wnal position (e.g., Now click on the cap). Displays consisted of the target picture (e.g., a cap), a monosyllabic competitor picture (e.g., a cat), a polysyllabic competitor picture (e.g., a captain) and a distractor (e.g., a beaker). The relative proportion of Wxations to the two types of competitor pictures changed as a function of the position of the target word in the utterance, demonstrating that lexical competition is modulated by prosodically conditioned phonetic variation.
Participants saw a small number of objects in a visual display and performed a visual detection or visual-discrimination task in the context of task-irrelevant spoken distractors. In each experiment, a visual cue was presented 400 ms after the onset of a spoken word. In experiments 1 and 2, the cue was an isoluminant color change and participants generated an eye movement to the target object. In experiment 1, responses were slower when the spoken word referred to the distractor object than when it referred to the target object. In experiment 2, responses were slower when the spoken word referred to a distractor object than when it referred to an object not in the display. In experiment 3, the cue was a small shift in location of the target object and participants indicated the direction of the shift. Responses were slowest when the word referred to the distractor object, faster when the word did not have a referent, and fastest when the word referred to the target object. Taken together, the results demonstrate that referents of spoken words capture attention.
Two visual-world experiments tested the hypothesis that expectations based on preceding prosody influence the perception of suprasegmental cues to lexical stress. The results demonstrate that listeners’ consideration of competing alternatives with different stress patterns (e.g., ‘jury/gi’raffe) can be influenced by the fundamental frequency and syllable timing patterns across material preceding a target word. When preceding stressed syllables distal to the target word shared pitch and timing characteristics with the first syllable of the target word, pictures of alternatives with primary lexical stress on the first syllable (e.g., jury) initially attracted more looks than alternatives with unstressed initial syllables (e.g., giraffe). This effect was modulated when preceding unstressed syllables had pitch and timing characteristics similar to the initial syllable of the target word, with more looks to alternatives with unstressed initial syllables (e.g., giraffe) than to those with stressed initial syllables (e.g., jury). These findings suggest that expectations about the acoustic realization of upcoming speech include information about metrical organization and lexical stress, and that these expectations constrain the initial interpretation of suprasegmental stress cues. These distal prosody effects implicate on-line probabilistic inferences about the sources of acoustic-phonetic variation during spoken-word recognition.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.