1996
DOI: 10.1007/978-3-662-13015-5_16
|View full text |Cite
|
Sign up to set email alerts
|

The Dynamics of Audiovisual Behavior in Speech

Abstract: While it is well-known that faces provide linguistically relevant information during communication, most efforts to identify the visual correlates of the acoustic signal have focused on the shape, position and luminance of the oral aperture. In this work, we extend the analysis to full facial motion under the assumption that the process of producing speech acoustics generates linguistically salient visual information, which is distributed over large portions of the face. Support for this is drawn from our rece… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

1998
1998
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 26 publications
(21 citation statements)
references
References 16 publications
0
21
0
Order By: Relevance
“…In particular, the authors made a strong argument for phonetic perception (see also Berenstein, 2003) rather than viseme clusters (as in Cohen et al1996 andOwens andBlazek 1985). This point is also strongly supported by the notion that "the moving vocal tract simultaneously shapes the acoustics and the motion of the face" (Vatikiotis-Bateson et al 1996). ---Thus there is the necessity for scrutiny of visual phonetic articulation.…”
Section: Introductionmentioning
confidence: 87%
“…In particular, the authors made a strong argument for phonetic perception (see also Berenstein, 2003) rather than viseme clusters (as in Cohen et al1996 andOwens andBlazek 1985). This point is also strongly supported by the notion that "the moving vocal tract simultaneously shapes the acoustics and the motion of the face" (Vatikiotis-Bateson et al 1996). ---Thus there is the necessity for scrutiny of visual phonetic articulation.…”
Section: Introductionmentioning
confidence: 87%
“…Both mouthshape (and the visibility of mouth parts) and mouth movement (the dynamics of mouth actions, including rate of speech) play their part. This insight is confirmed by a range of experimental findings 18,20,22 and by findings in applied telematics which show that speechreading accuracy for audiovisual inputs with auditory dynamic noise falls off as framerate (temporal resolution) of the display of the speaker's face drops from about 30Hz to [8][9][10][11][12]24 In addition, seen rate of speech can be readily discriminated and can directly affect the identification of a heard speech token 5,6 But how does this work? Is one process subsumed in the other?…”
Section: Introductionmentioning
confidence: 76%
“…Historically, multimodal speech and lip movement research was driven by cognitive science interest in intersensory audio-visual perception and the coordination of speech output with lip and facial movements (Benoit and Le Goff, 1998;Bernstein and Benoit, 1996;Cohen and Massaro, 1993;Massaro and Stork, 1998;McGrath and Summerfield, 1985;McGurk and MacDonald, 1976;McLeod and Summerfield, 1987;Robert-Ribes et al, 1998;Sumby and Pollack, 1954;Summerfield, 1992;Vatikiotis-Bateson et al, 1996). Among the many contributions of this literature was a detailed classification of human lip movements (visemes) and the viseme-phoneme mappings that occur during articulated speech.…”
Section: Later Advanced Multimodal Interfacesmentioning
confidence: 99%
“…The cognitive science literature has provided information on the integration patterns that typify people's speech, lip, and facial movements Ekman, 1992;Ekman and Friesen, 1978;Fridlund, 1994;Hadar et al, 1983;Massaro and Cohen, 1990;Stork and Hennecke, 1995;Vatikiotis-Bateson et al, 1996), speech with pen input, and speech with manual gestures (Kendon, 35 1980;McNeill, 1992;Oviatt et al, 1997). Early work on multimodal systems focused exclusively on processing point-and-speak integration patterns during selection actions.…”
Section: Multimodal Integration and Synchronization Patternsmentioning
confidence: 99%