2003
DOI: 10.1016/j.cub.2003.09.005
|View full text |Cite
|
Sign up to set email alerts
|

`Putting the Face to the Voice'

Abstract: Speech perception provides compelling examples of a strong link between auditory and visual modalities. This link originates in the mechanics of speech production, which, in shaping the vocal tract, determine the movement of the face as well as the sound of the voice. In this paper, we present evidence that equivalent information about identity is available cross-modally from both the face and voice. Using a delayed matching to sample task, XAB, we show that people can match the video of an unfamiliar face, X,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

8
80
6

Year Published

2011
2011
2022
2022

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 144 publications
(94 citation statements)
references
References 14 publications
8
80
6
Order By: Relevance
“…While previous face-voice matching studies using 2AFC procedures have found no difference between face first and voice first performance (Kamachi et al, 2003;Lachs & Pisoni, 2004), our results using a same-different task suggest people exhibit a bias to respond that a face and voice belong to the same identity, particularly when the face is presented before the voice. A performance asymmetry, according to stimuli order, is consistent with the previous literature.…”
Section: Face and Voice Matchingcontrasting
confidence: 95%
See 1 more Smart Citation
“…While previous face-voice matching studies using 2AFC procedures have found no difference between face first and voice first performance (Kamachi et al, 2003;Lachs & Pisoni, 2004), our results using a same-different task suggest people exhibit a bias to respond that a face and voice belong to the same identity, particularly when the face is presented before the voice. A performance asymmetry, according to stimuli order, is consistent with the previous literature.…”
Section: Face and Voice Matchingcontrasting
confidence: 95%
“…Mavica and Barenholtz (2013) tested whether people could use information from a voice to distinguish between two static images of different faces. Accuracy was significantly above chance level, despite contradictory results presented in previous studies (Kamachi et al, 2003;Lachs & Pisoni, 2004) suggesting that successful matching of faces and voices depends on the ability to encode dynamic properties of speaking (muted) faces (Mavica & Barenholtz, 2013). Previous face-voice matching studies (Kamachi et al, 2003;Krauss et.al., 2002;Mavica & Barenholtz, 2013) have used a two-alternative forced choice paradigm (2AFC), which unlike a same-different paradigm does not model whether people are also able to correctly reject a match when a face and voice are from different people.…”
Section: Methodsmentioning
confidence: 67%
“…If so, such predictions may occur at phonological or even pre-phonological levels of processing. For instance, phonology has been proposed as a common representational code for various aspects of speech perception (visual and acoustic) as well as production [22], [24], [30], [31], [32], [33]. Some evidence for a link between auditory and visual speech representations comes from Rosenblum et al [31], who exposed participants, previously inexperienced in lip reading, to silent video-clips of an actor producing speech.…”
Section: Introductionmentioning
confidence: 99%
“…In a subsequent task, the same participants performed auditory word recognition in noise, being more accurate when the words were spoken by the same speaker they had previously experienced visually (but not heard). Another example of cross-modal transfer in speech comes from Kamachi et al [33], who reported that people are able to match the identity of speakers across face and voice (i.e. cross-modally), according to the authors based on the link between perception and production of speech.…”
Section: Introductionmentioning
confidence: 99%
“…Even beyond conventional speech, listeners are able to judge the size of musical pitch intervals produced by singers on the basis of visual information alone (Thompson & Russo, 2007). Subjects are able to recognize individual talkers on the basis of correspondence between facial dynamics and speech acoustics in a delayed matching task with videos of unfamiliar faces and the sounds of unfamiliar voices (Kamachi, Hill, Lander, & Vatikiotis-Bateson, 2003). In tests of audiovisual speech perception the intelligibility of speech in noise is greater when the talker’s natural head movements are present (Davis & Kim, 2006; Munhall et al, 2004).…”
mentioning
confidence: 99%