2006
DOI: 10.1109/tip.2006.877528
|View full text |Cite
|
Sign up to set email alerts
|

Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading

Abstract: Abstract-There have been several studies that jointly use audio, lip intensity, and lip geometry information for speaker identification and speech-reading applications. This paper proposes using explicit lip motion information, instead of or in addition to lip intensity and/or geometry information, for speaker identification and speech-reading within a unified feature selection and discrimination analysis framework, and addresses two important issues: 1) Is using explicit lip motion information useful, and, 2)… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
33
0
1

Year Published

2007
2007
2019
2019

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 101 publications
(37 citation statements)
references
References 37 publications
0
33
0
1
Order By: Relevance
“…This block is identical to that of an audio-only ASR system and the features most commonly used are perceptual linear predictive [16] or Mel frequency cepstral coefficients [17,18]. In parallel, the face of the speaker has to be localized from the video sequence and the region of the mouth detected and normalized before relevant features can be extracted [1,19]. Typically, both audio and visual features are extended to include some temporal information of the speech process.…”
Section: Audio-visual Speech Recognitionmentioning
confidence: 99%
“…This block is identical to that of an audio-only ASR system and the features most commonly used are perceptual linear predictive [16] or Mel frequency cepstral coefficients [17,18]. In parallel, the face of the speaker has to be localized from the video sequence and the region of the mouth detected and normalized before relevant features can be extracted [1,19]. Typically, both audio and visual features are extended to include some temporal information of the speech process.…”
Section: Audio-visual Speech Recognitionmentioning
confidence: 99%
“…From this short literature review, we can conclude that the pixel based feature extraction techniques [1,3,5,14,17,20] are in general better fitted to encode the lips dynamics in a compact representation than the contour-based feature extraction methods [8,12,15]. Based on this conclusion, we formulated the visual speech recognition as the process of recognizing individual words based on a new manifold representation.…”
Section: Introductionmentioning
confidence: 95%
“…The identity recognition based on lip movement as a biological characteristic among these enjoys a great potential since it is relatively simple in data collection and low in equipment cost. Lip feature information extraction is the most crucial step [1]. There are basically two ways of extraction, namely static approach and dynamic approach.…”
Section: Introductionmentioning
confidence: 99%