The natural ecology of human language is face-to-face interaction comprising the exchange of a plethora of multimodal signals. Trying to understand the psycholinguistic processing of language in its natural niche raises new issues, first and foremost the binding of multiple, temporally offset signals under tight time constraints posed by a turn-taking system. This might be expected to overload and slow our cognitive system, but the reverse is in fact the case. We propose cognitive mechanisms that may explain this phenomenon and call for a multimodal, situated psycholinguistic framework to unravel the full complexities of human language processing. A Binding Problem at the Core of Language Language as it is used in its central ecological nichethat is, in face-to-face interactionis embedded in multimodal displays by both speaker and addressee. This is the niche in which it is learned, in which it evolved, and where the bulk of language usage occurs. Communication in this niche involves a complex orchestration of multiple articulators (see Glossary) and modalities: messages are auditory as well as visual, as they are spread across speech, nonspeech vocalizations, and the head, face, hands, arms, and torso. From the point of view of the recipient, this ought in principle to raise two serious computational challenges. First, not all bodily or facial movements are intended as part of the signal or contentthe incidental but irrelevant movements must be set aside (we call this the segregation problem); second, those that seem to be part of the message have to be paired with their counterparts (as when we say 'There!' and point), and simultaneity alone turns out to be an unreliable cue (this is our binding problem). In this Opinion article, we ask how the multiple signals carried by multiple articulators and on different modalities can be combined rapidly to build the phenomenology of a coherent message in the temporally demanding context of conversational speech.
One reason for the apparent gulf between animal and human communication systems is that the focus has been on the presence or the absence of language as a complex expressive system built on speech. But language normally occurs embedded within an interactional exchange of multi-modal signals. If this larger perspective takes central focus, then it becomes apparent that human communication has a layered structure, where the layers may be plausibly assigned different phylogenetic and evolutionary origins—especially in the light of recent thoughts on the emergence of voluntary breathing and spoken language. This perspective helps us to appreciate the different roles that the different modalities play in human communication, as well as how they function as one integrated system despite their different roles and origins. It also offers possibilities for reconciling the ‘gesture-first hypothesis’ with that of gesture and speech having evolved together, hand in hand—or hand in mouth, rather—as one system
Past research has investigated the impact of mutual knowledge on communication by focusing mainly on verbal communication. This study uses a wider focus, which includes speech and gesture. Speakers completed a referential communication task with recipients who did or did not share with them knowledge about the size of certain entities. The results showed that when such common ground exists between interlocutors, speakers' use of gesture and speech is affected. The main finding was that when speakers talked to recipients for whom the size information was new information, they represented this information predominantly in gesture only or in gesture and speech. However, when speakers talked to recipients with whom they shared knowledge about the entities' size, speakers encoded this information mainly verbally but not gesturally. The results are interpreted with respect to past research into common ground and language use, the pragmatics of gesture, and theories of gesture production.
The home of human language use is face-to-face interaction, a context in which communicative exchanges are characterised not only by bodily signals accompanying what is being said but also by a pattern of alternating turns at talk. This transition between turns is astonishingly fast-typically a mere 200-ms elapse between a current and a next speaker's contribution-meaning that comprehending, producing, and coordinating conversational contributions in time is a significant challenge. This begs the question of whether the additional information carried by bodily signals facilitates or hinders language processing in this time-pressured environment. We present analyses of multimodal conversations revealing that bodily signals appear to profoundly influence language processing in interaction: Questions accompanied by gestures lead to shorter turn transition times-that is, to faster responses-than questions without gestures, and responses come earlier when gestures end before compared to after the question turn has ended. These findings hold even after taking into account prosodic patterns and other visual signals, such as gaze. The empirical findings presented here provide a first glimpse of the role of the body in the psycholinguistic processes underpinning human communication.
One of the most intriguing aspects of human communication is its turn-taking system. It requires the ability to process on-going turns at talk while planning the next, and to launch this next turn without considerable overlap or delay. Recent research has investigated the eye movements of observers of dialogs to gain insight into how we process turns at talk. More specifically, this research has focused on the extent to which we are able to anticipate the end of current and the beginning of next turns. At the same time, there has been a call for shifting experimental paradigms exploring social-cognitive processes away from passive observation toward on-line processing. Here, we present research that responds to this call by situating state-of-the-art technology for tracking interlocutors’ eye movements within spontaneous, face-to-face conversation. Each conversation involved three native speakers of English. The analysis focused on question–response sequences involving just two of those participants, thus rendering the third momentarily unaddressed. Temporal analyses of the unaddressed participants’ gaze shifts from current to next speaker revealed that unaddressed participants are able to anticipate next turns, and moreover, that they often shift their gaze toward the next speaker before the current turn ends. However, an analysis of the complex structure of turns at talk revealed that the planning of these gaze shifts virtually coincides with the points at which the turns first become recognizable as possibly complete. We argue that the timing of these eye movements is governed by an organizational principle whereby unaddressed participants shift their gaze at a point that appears interactionally most optimal: It provides unaddressed participants with access to much of the visual, bodily behavior that accompanies both the current speaker’s and the next speaker’s turn, and it allows them to display recipiency with regard to both speakers’ turns.
Mimicry has been observed regarding a range of nonverbal behaviors, but only recently have researchers started to investigate mimicry in co-speech gestures. These gestures are considered to be crucially different from other aspects of nonverbal behavior due to their tight link with speech. This study provides evidence of mimicry in co-speech gestures in face-to-face dialogue, the most common forum of everyday talk. In addition, it offers an analysis of the functions that mimicked co-speech gestures fulfill in the collaborative process of creating a mutually shared understanding of referring expressions. The implications bear on theories of gesture production, research on grounding, and the mechanisms underlying behavioral mimicry.
Two self-paced reading-time experiments examined how ambiguous pronouns are interpreted under conditions that encourage shallow processing. In Experiment 1 we show that sentences containing ambiguous pronouns are processed at the same speed as those containing unambiguous pronouns under shallow processing, but more slowly under deep processing. We outline three possible models to account for the shallow processing of ambiguous pronouns. Two involve an initial commitment followed by possible revision, and the other involves a delay in interpretation. In Experiment 2 we provide evidence that supports the delayed model of ambiguous pronoun resolution under shallow processing. We found no evidence to support a processing system that makes an initial commitment to an interpretation of the pronoun when it is encountered. We extend the account of pronoun resolution proposed by Rigalleau, Caplan, and Baudiffier (2004) to include the treatment of ambiguous pronouns under shallow processing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.