Frame rate and viseme analysis for multimedia applications

Williams, J.J.; Rutledge, Janet C.; Garstecki, Dean C.; Katsaggelos, Aggelos K.

doi:10.1109/mmsp.1997.602606

Cited by 17 publications

(13 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…-linguistic -the classes of visemes are defined on the basis of an intuitive linguistic classification of groups of phonemes according to their expected visual realization, -data driven -the classes of visemes are defined on the basis of data acquired through parameter extraction and clustering [40].…”

Section: Viseme Classification Methodsmentioning

confidence: 99%

See 1 more Smart Citation

A comparative study of English viseme recognition methods and algorithms

Jachimski

Czyżewski

Ciszewski

2017

Multimed Tools Appl

View full text Add to dashboard Cite

An elementary visual unit -the viseme is concerned in the paper in the context of preparing the feature vector as a main visual input component of Audio-Visual Speech Recognition systems. The aim of the presented research is a review of various approaches to the problem, the implementation of algorithms proposed in the literature and a comparative research on their effectiveness. In the course of the study an optimal feature vector construction and an appropriate selection of the classifier were sought. The experimental research was conducted on the basis of a spoken corpus in which speech was represented both acoustically and visually. The extracted features represented three types: geometrical, textural and mixed ones. The features were processed employing the classification algorithms based on Hidden Markov Models and Sequential Minimal Optimization. Tests were carried out employing the processed video material recorded with English native speakers who read specially prepared list of commands. The obtained results are discussed in the paper.

show abstract

Section: Viseme Classification Methodsmentioning

confidence: 99%

“…17 and 18 three most important parameters used in many implementations during parameter extraction from the lip area are shown [1,22,28,40]. They are geometrical parameters: the outer horizontal aperture, the outer vertical aperture and the angle of lip opening.…”

Section: Preparation Of Visual Feature Parameter Vectormentioning

confidence: 99%

A comparative study of English viseme recognition methods and algorithms

Jachimski

Czyżewski

Ciszewski

2017

Multimed Tools Appl

View full text Add to dashboard Cite

show abstract

“…For instance (Dupont & Luettin, 2000) and (Luettin et al, 1996) combine ASM with PCA features and (Chiou & Hwang, 1997) combines snake features with PCA. It was shown that the tongue, teeth and cavity have great influence on lip reading (Williams et al, 1998), therefore, the addition of these appearance related elements has significant influence on the performance of lip reading (Chitu et al, 2007). A special example is the so called Active Appearance Models (AAM) (Cootes et al, 1998) which combines the ASM method with texture based information to accurately detect the shape of the mouth or the face.…”

Section: Feature Vectors Definitionmentioning

confidence: 99%

“…The teeth, the tongue and the cavity were shown to be of great importance for lip reading by humans (Williams et al, 1998). Also other face elements were shown to be important during face to face communication; however, their exact influence is not completely elucidated.…”

Section: Introductionmentioning

confidence: 99%

Automatic Visual Speech Recognition

Chitu¹,

Rothkrantz²

2012

Speech Enhancement, Modeling and Recognition- Algorithms and Applications

View full text Add to dashboard Cite

“…The concept of visual phoneme does not suggest an explicit definition of lips' structure during phoneme utterance. The visemes are formed based on human perceptions which are categorized using confusion matrix where the most accurately detected visemes form a phoneme-viseme table (Williams, Rutledge, Garstecki, & Katsaggelos, 1997). The deficiency of this method can be observed by the fact that there are various phoneme-viseme tables used (Goldschen, Garcia, & Petajan, 1994;Hazen, Saenko, La, & Glass, 2004;Jiang, Alwan, Auer, & Bernstein, 2001).…”

Section: Introductionmentioning

confidence: 99%

A Novel Approach for Allocating Mathematical Expressions to Visual Speech Signals

2015

View full text Add to dashboard Cite

In this article, visual speech information modeling analysis by explicit mathematical expressions coupled with words' phonemic structure is presented. The visual information is obtained from deformation of lips' dimensions during articulation of a set of words that is called visual speech sample set. The continuous interpretation of the lips' movement has been provided using Barycentric Lagrange Interpolation producing a unique mathematical expression named visual speech signal. Hierarchical analysis of the phoneme sequences has been applied for words' categorization to organize the database properly. The visual samples were extracted from three visual feature points chosen on the lips via an experiment in which two individuals pronounced the aforementioned words. The simulation results show that each individual word can be represented by a mathematical expression or visual speech signal whereas the sample sets can also be derived from the same mathematical expression, and this is a significant improvement over the popular statistical methods.

show abstract

Frame rate and viseme analysis for multimedia applications

Cited by 17 publications

References 6 publications

A comparative study of English viseme recognition methods and algorithms

A comparative study of English viseme recognition methods and algorithms

Automatic Visual Speech Recognition

A Novel Approach for Allocating Mathematical Expressions to Visual Speech Signals

Contact Info

Product

Resources

About