1996
DOI: 10.1007/3-540-61123-1_154
|View full text |Cite
|
Sign up to set email alerts
|

Real-time lip tracking for audio-visual speech recognition applications

Abstract: Developments in dynamic contour tracking permit sparse representation of the outlines of moving contours. Given the increasing computing power of general-purpose workstations it is now possible to track human faces and parts of faces in real-time without special hardware. This paper describes a real-time lip tracker that uses a Kalman filter based dynamic contour to track the outline of the lips. Two al~ernative lip trackers, one that tracks lips from a profile view and the other from a frontal view, were deve… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
28
0
2

Year Published

1996
1996
2016
2016

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 54 publications
(30 citation statements)
references
References 19 publications
0
28
0
2
Order By: Relevance
“…For example, a human face is composed of outer face contour, eyebrows, eyes, nose, and mouth. Analyzing the motion of structured deformable shapes has many real applications such as tracking human lips for speech recognition [1], locating human faces for face recognition [2], and medical applications such as tracking the endocardial wall [3]. The structured deformation is different from articulated motion.…”
Section: Introductionmentioning
confidence: 99%
“…For example, a human face is composed of outer face contour, eyebrows, eyes, nose, and mouth. Analyzing the motion of structured deformable shapes has many real applications such as tracking human lips for speech recognition [1], locating human faces for face recognition [2], and medical applications such as tracking the endocardial wall [3]. The structured deformation is different from articulated motion.…”
Section: Introductionmentioning
confidence: 99%
“…In audio-visual speech recognition [17,19], visual features obtained by tracking the movement of lips and mouths are combined with audio features for improved speech recognition. In audio-visual object detection and tracking [3,8], synchronized visual foreground objects and audio background sounds are used for object detection [8].…”
Section: Introductionmentioning
confidence: 99%
“…A method based on b-splines and Kalman filters has been described in [12 ]. A stochastic dynamic model is learned from example sequences which enhances the tracking speed and robustness to distractions.…”
Section: Related Workmentioning
confidence: 99%