a b s t r a c tAmbiguity in natural language is ubiquitous, yet spoken communication is effective due to integration of information carried in the speech signal with information available in the surrounding multimodal landscape. Language mediated visual attention requires visual and linguistic information integration and has thus been used to examine properties of the architecture supporting multimodal processing during spoken language comprehension. In this paper we test predictions generated by alternative models of this multimodal system. A model (TRACE) in which multimodal information is combined at the point of the lexical representations of words generated predictions of a stronger effect of phonological rhyme relative to semantic and visual information on gaze behaviour, whereas a model in which sub-lexical information can interact across modalities (MIM) predicted a greater influence of visual and semantic information, compared to phonological rhyme. Two visual world experiments designed to test these predictions offer support for sub-lexical multimodal interaction during online language processing.
Learning to map words onto their referents is difficult, because there are multiple possibilities for forming these mappings. Cross-situational learning studies have shown that word-object mappings can be learned across multiple situations, as can verbs when presented in a syntactic context. However, these previous studies have presented either nouns or verbs in ambiguous contexts and thus bypass much of the complexity of multiple grammatical categories in speech. We show that noun word learning in adults is robust when objects are moving, and that verbs can also be learned from similar scenes without additional syntactic information. Furthermore, we show that both nouns and verbs can be acquired simultaneously, thus resolving category-level as well as individual word-level ambiguity. However, nouns were learned more quickly than verbs, and we discuss this in light of previous studies investigating the noun advantage in word learning.
Language-mediated visual attention describes the interaction of two fundamental components of the human cognitive system, language and vision. Within this paper we present an amodal shared resource model of language-mediated visual attention that offers a description of the information and processes involved in this complex multimodal behavior and a potential explanation for how this ability is acquired. We demonstrate that the model is not only sufficient to account for the experimental effects of Visual World Paradigm studies but also that these effects are emergent properties of the architecture of the model itself, rather than requiring separate information processing channels or modular processing systems. The model provides an explicit description of the connection between the modality-specific input from language and vision and the distribution of eye gaze in language-mediated visual attention. The paper concludes by discussing future applications for the model, specifically its potential for investigating the factors driving observed individual differences in language-mediated eye gaze.
a b s t r a c tLearning to read and write requires an individual to connect additional orthographic representations to pre-existing mappings between phonological and semantic representations of words. Past empirical results suggest that the process of learning to read and write (at least in alphabetic languages) elicits changes in the language processing system, by either increasing the cognitive efficiency of mapping between representations associated with a word, or by changing the granularity of phonological processing of spoken language, or through a combination of both. Behavioural effects of literacy have typically been assessed in offline explicit tasks that have addressed only phonological processing. However, a recent eye tracking study compared high and low literate participants on effects of phonology and semantics in processing measured implicitly using eye movements. High literates' eye movements were more affected by phonological overlap in online speech than low literates, with only subtle differences observed in semantics. We determined whether these effects were due to cognitive efficiency and/or granularity of speech processing in a multimodal model of speech processing -the amodal shared resource model (ASR, Smith, Monaghan, & Huettig, 2013a,b). We found that cognitive efficiency in the model had only a marginal effect on semantic processing and did not affect performance for phonological processing, whereas fine-grained versus coarse-grained
Orthographic systems vary dramatically in the extent to which they encode a language's phonological and lexico-semantic structure. Studies of the effects of orthographic transparency suggest that such variation is likely to have major implications for how the reading system operates. However, such studies have been unable to examine in isolation the contributory effect of transparency on reading due to co-varying linguistic or socio-cultural factors. We first investigated the phonological properties of languages using the range of the world's orthographic systems (alphabetic; alphasyllabic; consonantal; syllabic; logographic), and found that, once geographical proximity is taken into account, phonological properties do not relate to orthographic system. We then explored the processing implications of orthographic variation by training a connectionist implementation of the triangle model of reading on the range of orthographic systems whilst controlling for phonological and semantic structure. We show that the triangle model is effective as a universal model of reading, able to replicate key behavioural and neuroscientific results. Importantly, the model also generates new predictions deriving from an explicit description of the effects of orthographic transparency on how reading is realised and defines the consequences of orthographic systems on reading processes.
Computational models can reflect the complexity of human behaviour by implementing multiple constraints within their architecture, and/or by taking into account the variety and richness of the environment to which the human is responding. We explore the second alternative in a model of word recognition that learns to map spoken words to visual and semantic representations of the words' concepts. Critically, we employ a phonological representation utilising coarse-coding of the auditory stream, to mimic early stages of language development that are not dependent on individual phonemes to be isolated in the input, which may be a consequence of literacy development. The model was tested at different stages during training, and was able to simulate key behavioural features of word recognition in children: a developing effect of semantic information as a consequence of language learning, and a small but earlier effect of phonological information on word processing. We additionally tested the role of visual information in word processing, generating predictions for behavioural studies, showing that visual information could have a larger effect than semantics on children's performance, but that again this affects recognition later in word processing than phonological information. The model also provides further predictions for performance of a mature word recognition system in the absence of fine-coding of phonology, such as in adults who have low literacy skills. The model demonstrated that such phonological effects may be reduced but are still evident even when multiple distractors from various modalities are present in the listener's environment. The model demonstrates that complexity in word recognition can emerge from a simple associative system responding to the interactions between multiple sources of information in the language learner's environment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.