2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221)
DOI: 10.1109/icassp.2001.940796
|View full text |Cite
|
Sign up to set email alerts
|

Application of affine-invariant Fourier descriptors to lipreading for audio-visual speech recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 27 publications
(20 citation statements)
references
References 10 publications
0
20
0
Order By: Relevance
“…Such improvements have been typically demonstrated on databases of small duration, and, in most cases, limited to a very small number of speakers (mostly less than ten, and often singlesubject) and to small vocabulary tasks [18], [21]. Common tasks typically include recognition of non-sense words [22], [23], isolated words [19], [24][25][26][27][28][29][30], connected digits [31], [32], letters [31], or of closed-set sentences [33], mostly in English, but also in French [22], [34], [35], German [36], [37], and Japanese [38], among others. Recently however, significant improvements have also been demonstrated for large vocabulary continuous speech recognition (LVCSR) [39], as well as cases of speech degraded due to speech impairment [40] or Lombard effects [29].…”
Section: Audio-only Asr Visual-only Asr ( Automatic Speechreadingmentioning
confidence: 99%
“…Such improvements have been typically demonstrated on databases of small duration, and, in most cases, limited to a very small number of speakers (mostly less than ten, and often singlesubject) and to small vocabulary tasks [18], [21]. Common tasks typically include recognition of non-sense words [22], [23], isolated words [19], [24][25][26][27][28][29][30], connected digits [31], [32], letters [31], or of closed-set sentences [33], mostly in English, but also in French [22], [34], [35], German [36], [37], and Japanese [38], among others. Recently however, significant improvements have also been demonstrated for large vocabulary continuous speech recognition (LVCSR) [39], as well as cases of speech degraded due to speech impairment [40] or Lombard effects [29].…”
Section: Audio-only Asr Visual-only Asr ( Automatic Speechreadingmentioning
confidence: 99%
“…Af f ine inva ria nt Fourie r de sc riptors ha ve be e n use d f or l ip re a ding [ 10] a nd f or re c ognition of a irc ra f ts [ 11] . Usa ge of a f f ine inva ria nt Fourie r de sc riptors in huma n posture e stima tion is a ne w a pproa c h e spe c ia l l y to a c tivity re c ognition.…”
Section: Affine Invariant Fourier Descriptorsmentioning
confidence: 99%
“…The researchers then obtained visual features, namely the affine-invariant Fourier descriptors (AIFDs) [21], the DCT, the rotation-corrected DCT (rc-DCT) and the B-Spline template (BST) [19]. Due to their greater sensitivity to lip shape, the appearance-based features, DCT and rc-DCT, demonstrated good performance compared to that obtained using the shape-based features, AIFDs and BST.…”
Section: Introductionmentioning
confidence: 99%