`Putting the Face to the Voice'

Kamachi, Miyuki; Hill, Harold; Lander, Karen; Vatikiotis‐Bateson, Eric

doi:10.1016/j.cub.2003.09.005

Cited by 144 publications

(94 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While previous face-voice matching studies using 2AFC procedures have found no difference between face first and voice first performance (Kamachi et al, 2003;Lachs & Pisoni, 2004), our results using a same-different task suggest people exhibit a bias to respond that a face and voice belong to the same identity, particularly when the face is presented before the voice. A performance asymmetry, according to stimuli order, is consistent with the previous literature.…”

Section: Face and Voice Matchingcontrasting

confidence: 95%

“…Mavica and Barenholtz (2013) tested whether people could use information from a voice to distinguish between two static images of different faces. Accuracy was significantly above chance level, despite contradictory results presented in previous studies (Kamachi et al, 2003;Lachs & Pisoni, 2004) suggesting that successful matching of faces and voices depends on the ability to encode dynamic properties of speaking (muted) faces (Mavica & Barenholtz, 2013). Previous face-voice matching studies (Kamachi et al, 2003;Krauss et.al., 2002;Mavica & Barenholtz, 2013) have used a two-alternative forced choice paradigm (2AFC), which unlike a same-different paradigm does not model whether people are also able to correctly reject a match when a face and voice are from different people.…”

Section: Methodsmentioning

confidence: 67%

See 1 more Smart Citation

Concordant Cues in Faces and Voices

et al. 2016

View full text Add to dashboard Cite

Information from faces and voices combines to provide multimodal signals about a person. Faces and voices may offer redundant, overlapping (backup signals), or complementary information (multiple messages). This article reports two experiments which investigated the extent to which faces and voices deliver concordant information about dimensions of fitness and quality. In Experiment 1, participants rated faces and voices on scales for masculinity/femininity, age, health, height, and weight. The results showed that people make similar judgments from faces and voices, with particularly strong correlations for masculinity/femininity, health, and height. If, as these results suggest, faces and voices constitute backup signals for various dimensions, it is hypothetically possible that people would be able to accurately match novel faces and voices for identity. However, previous investigations into novel face-voice matching offer contradictory results. In Experiment 2, participants saw a face and heard a voice and were required to decide whether the face and voice belonged to the same person. Matching accuracy was significantly above chance level, suggesting that judgments made independently from faces and voices are sufficiently similar that people can match the two. Both sets of results were analyzed using multilevel modeling and are interpreted as being consistent with the backup signal hypothesis.

show abstract

Section: Face and Voice Matchingcontrasting

confidence: 95%

Section: Methodsmentioning

confidence: 67%

Concordant Cues in Faces and Voices

et al. 2016

View full text Add to dashboard Cite

show abstract

“…If so, such predictions may occur at phonological or even pre-phonological levels of processing. For instance, phonology has been proposed as a common representational code for various aspects of speech perception (visual and acoustic) as well as production [22], [24], [30], [31], [32], [33]. Some evidence for a link between auditory and visual speech representations comes from Rosenblum et al [31], who exposed participants, previously inexperienced in lip reading, to silent video-clips of an actor producing speech.…”

Section: Introductionmentioning

confidence: 99%

“…In a subsequent task, the same participants performed auditory word recognition in noise, being more accurate when the words were spoken by the same speaker they had previously experienced visually (but not heard). Another example of cross-modal transfer in speech comes from Kamachi et al [33], who reported that people are able to match the identity of speakers across face and voice (i.e. cross-modally), according to the authors based on the link between perception and production of speech.…”

Section: Introductionmentioning

confidence: 99%

Cross-Modal Prediction in Speech Perception

et al. 2011

View full text Add to dashboard Cite

Speech perception often benefits from vision of the speaker's lip movements when they are available. One potential mechanism underlying this reported gain in perception arising from audio-visual integration is on-line prediction. In this study we address whether the preceding speech context in a single modality can improve audiovisual processing and whether this improvement is based on on-line information-transfer across sensory modalities. In the experiments presented here, during each trial, a speech fragment (context) presented in a single sensory modality (voice or lips) was immediately continued by an audiovisual target fragment. Participants made speeded judgments about whether voice and lips were in agreement in the target fragment. The leading single sensory context and the subsequent audiovisual target fragment could be continuous in either one modality only, both (context in one modality continues into both modalities in the target fragment) or neither modalities (i.e., discontinuous). The results showed quicker audiovisual matching responses when context was continuous with the target within either the visual or auditory channel (Experiment 1). Critically, prior visual context also provided an advantage when it was cross-modally continuous (with the auditory channel in the target), but auditory to visual cross-modal continuity resulted in no advantage (Experiment 2). This suggests that visual speech information can provide an on-line benefit for processing the upcoming auditory input through the use of predictive mechanisms. We hypothesize that this benefit is expressed at an early level of speech analysis.

show abstract

“…Even beyond conventional speech, listeners are able to judge the size of musical pitch intervals produced by singers on the basis of visual information alone (Thompson & Russo, 2007). Subjects are able to recognize individual talkers on the basis of correspondence between facial dynamics and speech acoustics in a delayed matching task with videos of unfamiliar faces and the sounds of unfamiliar voices (Kamachi, Hill, Lander, & Vatikiotis-Bateson, 2003). In tests of audiovisual speech perception the intelligibility of speech in noise is greater when the talker’s natural head movements are present (Davis & Kim, 2006; Munhall et al, 2004).…”

mentioning

confidence: 99%

Infant-directed visual prosody

Smith¹,

Strader²

2014

View full text Add to dashboard Cite

Acoustical changes in the prosody of mothers’ speech to infants are distinct and near universal. However, less is known about the visible properties mothers’ infant-directed (ID) speech, and their relation to speech acoustics. Mothers’ head movements were tracked as they interacted with their infants using ID speech, and compared to movements accompanying their adult-directed (AD) speech. Movement measures along three dimensions of head translation, and three axes of head rotation were calculated. Overall, more head movement was found for ID than AD speech, suggesting that mothers exaggerate their visual prosody in a manner analogous to the acoustical exaggerations in their speech. Regression analyses examined the relation between changing head position and changing acoustical pitch (F0) over time. Head movements and voice pitch were more strongly related in ID speech than in AD speech. When these relations were examined across time windows of different durations, stronger relations were observed for shorter time windows (< 5 sec). However, the particular form of these more local relations did not extend or generalize to longer time windows. This suggests that the multimodal correspondences in speech prosody are variable in form, and occur within limited time spans.

show abstract

`Putting the Face to the Voice'

Cited by 144 publications

References 14 publications

Concordant Cues in Faces and Voices

Concordant Cues in Faces and Voices

Cross-Modal Prediction in Speech Perception

Infant-directed visual prosody

Contact Info

Product

Resources

About