Visual speech recognition: a solution from feature extraction to words classification

Silveira, Luiz Gonzaga da; Facon, Jacques; Borges, Díbio Leandro

doi:10.1109/sibgra.2003.1241036

Cited by 29 publications

(16 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This approach is based upon the follow ideas: when a person is speaking, the human face is quiescent relative to the camera; the lip motion in an image sequence presents high frequency in comparison to other parts of the human face [11]. An image sequence of mandarin Chinese words' pronunciations is shown in …”

Section: Lip Featuresmentioning

confidence: 99%

Geometrical and Pixel Based Lip Feature Fusion in Speech Synthesis System Driven by Visual-speech

Wang

2010

2010 Second International Conference on Computational Intelligence and Natural Computing

View full text Add to dashboard Cite

Lipreading is applied to synthesize speech for the speech-impaired people. To get a higher recognition result, data fusion with weighting coefficients at feature level is used to integrate the lip information from diff erent kinds of lip features. Experiments are carried out based on HMM with diff erent states and Gaussian mixture component in a small database for speaker-dependent case. From the recognition results, the most important conclusion that can be drawn is that, the integrated discriminate vector after feature fusion outperforms than geometrical features vector only, DCT descriptors vector only and DCT coefficients vector only with 4 states and 16 Gaussian mixture component HMM. And compare with the geometrical features vector and DCT descriptors cascaded method, the geometrical features vector and DCT coefficients cascaded method integrates more information of lip region, and the recognition rate is improved by as much as 3.18% with best weighting coefficients (m: n= 1.5: 1).

show abstract

Section: Lip Featuresmentioning

confidence: 99%

Geometrical and Pixel Based Lip Feature Fusion in Speech Synthesis System Driven by Visual-speech

Wang

2010

2010 Second International Conference on Computational Intelligence and Natural Computing

View full text Add to dashboard Cite

show abstract

“…In this regard, the feature extraction techniques that have been applied in the development of VSR systems can be divided into two main categories, shape-based and intensity based. In general, the shape-based feature extraction techniques attempt to identify the lips in the image based either on geometrical templates that encode a standard set of mouth shapes [17] or on the application of active contours [3]. Since these approaches require extensive training to sample the spectrum of mouth shapes, recently the feature extraction has been carried out in the intensity domain.…”

Section: Introductionmentioning

confidence: 99%

A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition

Ghita

Sutherland

et al. 2010

IPSJ Transactions on Computer Vision and Applications

View full text Add to dashboard Cite

Abstract. This paper presents the development of a novel visual speech recognition (VSR) system based on a new representation that extends the standard viseme concept (that is referred in this paper to as Visual Speech Unit (VSU)) and Hidden Markov Models (HMM). The visemes have been regarded as the smallest visual speech elements in the visual domain and they have been widely applied to model the visual speech, but it is worth noting that they are problematic when applied to the continuous visual speech recognition. To circumvent the problems associated with standard visemes, we propose a new visual speech representation that includes not only the data associated with the articulation of the visemes but also the transitory information between consecutive visemes. To fully evaluate the appropriateness of the proposed visual speech representation, in this paper an extensive set of experiments have been conducted to analyse the performance of the visual speech units when compared with that offered by the standard MPEG-4 visemes. The experimental results indicate that the developed VSR application achieved up to 90% correct recognition when the system has been applied to the identification of 60 classes of VSUs, while the recognition rate for the standard set of MPEG-4 visemes was only in the range 62-72%.

show abstract

“…The visual information is effective to improve the performance of recognition accuracy in noisy environments. For lip reading, some researchers proposed the method using the frontal face or side face image [9,10,5,3], or combining visual and auditory information [6,7,5,3]. In this paper, we focused to investigate the lip region and feature for lip reading.…”

Section: Introductionmentioning

confidence: 99%

“…For instance, there are Japanese [7,8], English [6,5], French [9], and Portuguese [10], etc. However, there is no research to refer the language and the recognition method.…”

Section: Introductionmentioning

confidence: 99%

Analysis of efficient lip reading method for various languages

Saitoh

Morishita

Konishi

2008

2008 19th International Conference on Pattern Recognition

View full text Add to dashboard Cite

The traditional researches targeted at only one language, and there is no research to refer the language and recognition method. Moreover, a lot of modelbased methods use only an external lip or intraoral region, and tooth or tongue region is not reflected to the feature. This paper describes analysis of efficient lip reading method for various languages. First, we applies active appearance model, and simultaneously extracts the external and internal lip contour. Then, the tooth and intraoral regions are detected. Various features from five regions are fed to the recognition process. We set four languages to be the recognition target, and recorded twenty words per each language. As the result, proposed trajectory feature based on three shape features, the area and aspect ratio of internal lip region, and area of intraoral region, was obtained the highest recognition rates of 93.6%, compared with the traditional methods and other regions.

show abstract

Visual speech recognition: a solution from feature extraction to words classification

Cited by 29 publications

References 11 publications

Geometrical and Pixel Based Lip Feature Fusion in Speech Synthesis System Driven by Visual-speech

Geometrical and Pixel Based Lip Feature Fusion in Speech Synthesis System Driven by Visual-speech

A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition

Analysis of efficient lip reading method for various languages

Contact Info

Product

Resources

About