The ecology of human language is face-to-face interaction, comprising cues such as prosody, co-speech gestures and mouth movements. Yet, the multimodal context is usually stripped away in experiments as dominant paradigms focus on linguistic processing only. In two studies we presented video-clips of an actress producing naturalistic passages to participants while recording their electroencephalogram. We quantified multimodal cues (prosody, gestures, mouth movements) and measured their effect on a well-established electroencephalographic marker of processing load in comprehension (N400). We found that brain responses to words were affected by informativeness of co-occurring multimodal cues, indicating that comprehension relies on linguistic and non-linguistic cues. Moreover, they were affected by interactions between the multimodal cues, indicating that the impact of each cue dynamically changes based on the informativeness of other cues. Thus, results show that multimodal cues are integral to comprehension, hence, our theories must move beyond the limited focus on speech and linguistic processing.
Across disciplines, researchers are eager to gain insight into empirical features of abstract vs. concrete concepts. In this work, we provide a detailed characterisation of the distributional nature of abstract and concrete words across 16,620 English nouns, verbs and adjectives. Specifically, we investigate the following questions: (1) What is the distribution of concreteness in the contexts of concrete and abstract target words? (2) What are the differences between concrete and abstract words in terms of contextual semantic diversity? (3) How does the entropy of concrete and abstract word contexts differ? Overall, our studies show consistent differences in the distributional representation of concrete and abstract words, thus challenging existing theories of cognition and providing a more fine-grained description of their nature.
While neural networks with attention mechanisms have achieved superior performance on many natural language processing tasks, it remains unclear to which extent learned attention resembles human visual attention. In this paper, we propose a new method that leverages eye-tracking data to investigate the relationship between human visual attention and neural attention in machine reading comprehension. To this end, we introduce a novel 23 participant eye tracking dataset -MQA-RC, in which participants read movie plots and answered pre-defined questions. We compare state of the art networks based on long shortterm memory (LSTM), convolutional neural models (CNN) and XLNet Transformer architectures. We find that higher similarity to human attention and performance significantly correlates to the LSTM and CNN models. However, we show this relationship does not hold true for the XLNet models -despite the fact that the XLNet performs best on this challenging task. Our results suggest that different architectures seem to learn rather different neural attention strategies and similarity of neural to human attention does not guarantee best performance.
The natural ecology of human language is face-to-face interaction, comprising cues, like cospeech gestures, mouth movements and prosody, tightly synchronized with speech. Yet, this rich multimodal context is usually stripped away in experimental studies as the dominant paradigm focuses on speech alone. We ask how these audio-visual cues impact brain activity during naturalistic language comprehension, how they are dynamically orchestrated and whether they are organized hierarchically. We quantify each cue in video-clips of a speaker and we used a well-established electroencephalographic marker of comprehension difficulties, an event-related potential, peaking around 400ms after word-onset. We found that multimodal cues always modulated brain activity in interaction with speech, that their impact dynamically changes with their informativeness and that there is a hierarchy: prosody shows the strongest effect followed by gestures and mouth movements. Thus, this study provides a first snapshot into how the brain dynamically weights audiovisual cues in real-world language comprehension. Electrophysiology of multimodal comprehension ! 3 Electrophysiology of multimodal comprehension ! 4 frame theories of natural language processing because if some multimodal cues (e.g., gesture or prosody) always contribute to processing, this would imply that our current speech-only focus is too narrow, if not misleading. Second, we need to understand the dynamics of online multimodal comprehension. In particular, to provide mechanistic accounts of language comprehension, it is necessary to establish how the weight of a certain cue dynamically changes depending upon the context (e.g., whether meaningful hand gestures are weighted more when prior linguistic context is less informative and/or when mouth movements are less informative). Finally, it is important to establish whether there is a stable hierarchical organization of cues (e.g., prior linguistic context may always be weighted more than gestures, which are in turn weighted more than mouth movements). Prosody, gesture and mouth movements as predictors of upcoming words: the state of the artAccentuation (i.e., prosodic stress characterized as higher pitch that makes words acoustically prominent) marks new information 10 . Many behavioural studies have revealed that comprehension is facilitated with appropriate accentuation (new information is accentuated, and old information de-accentuated 11,12 . Incongruence between the presence of prosodic accentuation and newness of information increases processing difficulty, inducing increased activation in left inferior frontal gyrus, interpreted as increased phonological and semantic processing difficulty 13 . In electrophysiological (EEG) studies, such mismatch elicits more negative N400 (an event-related-potential (ERP) peaking negatively 400ms after word presentation around central-parietal areas 14 , that has been argued to mark prediction in language comprehension 2 ) than appropriate accentuation 15-20 .Electrophysiology of multimod...
In recent years, both cognitive and computational research has provided empirical analyses of contextual co-occurrence of concrete and abstract words, partially resulting in inconsistent pictures. In this work we provide a more fine-grained description of the distributional nature in the corpusbased interaction of verbs and nouns within subcategorisation, by investigating the concreteness of verbs and nouns that are in a specific syntactic relationship with each other, i.e., subject, direct object, and prepositional object. Overall, our experiments show consistent patterns in the distributional representation of subcategorising and subcategorised concrete and abstract words. At the same time, the studies reveal empirical evidence why contextual abstractness represents a valuable indicator for automatic non-literal language identification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.