2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221)
DOI: 10.1109/icassp.2001.940793
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical discriminant features for audio-visual LVCSR

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
53
0

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 56 publications
(53 citation statements)
references
References 9 publications
0
53
0
Order By: Relevance
“…As such, visual speech recognition has become the focus of numerous research projects. Work includes investigations into modelling techniques and feature stream combination [2][3] [4][5] feature extraction [6] [7][8] and more recently head pose invariant lip-reading [9] [10]. One of the factors limiting the recognition accuracy of visual only speech recognition is the small number of possible lip shapes/movements in relation to the range of corresponding vocal sounds.…”
Section: Introductionmentioning
confidence: 99%
“…As such, visual speech recognition has become the focus of numerous research projects. Work includes investigations into modelling techniques and feature stream combination [2][3] [4][5] feature extraction [6] [7][8] and more recently head pose invariant lip-reading [9] [10]. One of the factors limiting the recognition accuracy of visual only speech recognition is the small number of possible lip shapes/movements in relation to the range of corresponding vocal sounds.…”
Section: Introductionmentioning
confidence: 99%
“…In the first case, the audio and visual features are combined projecting them onto an audio-visual feature space, where traditional single-stream classifiers are used [33][34][35][36]. Decision fusion, on its turn, processes the streams separately and, at a certain level, combines the outputs of each singlemodality classifier.…”
Section: Audio-visual Integration and Classificationmentioning
confidence: 99%
“…Visual speech information can play a vital role for the improvement of natural and robust human-computer interaction [3,4]. Most published works in the areas of speech recognition and speaker recognition focus on speech under the noiseless environments and few published works focus on speech under noisy conditions [5,6].…”
Section: Introductionmentioning
confidence: 99%