The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2008
DOI: 10.1109/tcsvt.2008.2004924
|View full text |Cite
|
Sign up to set email alerts
|

An Automatic Lipreading System for Spoken Digits With Limited Training Data

Abstract: It is well known that visual cues of lip movement contain important speech relevant information. This paper presents an automatic lipreading system for small vocabulary speech recognition tasks. Using the lip segmentation and modeling techniques we developed earlier, we obtain a visual feature vector composed of outer and inner mouth features from the lip image sequence for recognition. A spline representation is employed to transform the discrete-time sampled features from the video frames into the continuous… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 24 publications
(6 citation statements)
references
References 13 publications
0
6
0
Order By: Relevance
“…Lip image processing has attracted wide-spread research interest in recent years for its wide application in automatic visual speech recognition [1][2], visual speaker authentication [3][4][5], lip synchronization for facial animation [6], etc. Lip region segmentation, which is also referred to as lip segmentation, is the first and most crucial step in various lip-related applications [7].…”
Section: Introductionmentioning
confidence: 99%
“…Lip image processing has attracted wide-spread research interest in recent years for its wide application in automatic visual speech recognition [1][2], visual speaker authentication [3][4][5], lip synchronization for facial animation [6], etc. Lip region segmentation, which is also referred to as lip segmentation, is the first and most crucial step in various lip-related applications [7].…”
Section: Introductionmentioning
confidence: 99%
“…In recent years, image processing techniques have been extensively developed for human lip recognition, which can automatically detect and analyse the unstable shape of human lips and distinguish in real time whether the user is speaking or not. Examples include audiovisual speech recognition (AVSR) [1], visual speech recognition (VSR) [2,3], speaker recognition [4][5][6], intelligent humancomputer interaction (IHCI) [7], vision-based voice activity detection (VVAD), etc. Research in the field of speech technology has achieved remarkable results both at home and abroad.…”
Section: Literaturesmentioning
confidence: 99%
“…There are 23 ALR architectures targeting digit or alphabet recognition since 2007. Looking at Tables 4, 5 and 6 we observe that most traditional systems use feature techniques based on image transforms [108,9,66,109,110] or shape and appearance models [56,111,112,7,113]. In Figure 4 we show i) the number of times that each feature technique has been integrated into ALR systems addressing digit or letter recognition; ii) the same for each classification method.…”
Section: Digit and Letter Recognitionmentioning
confidence: 99%