2015
DOI: 10.1007/s00530-015-0499-9
|View full text |Cite
|
Sign up to set email alerts
|

Audio-visual speech recognition integrating 3D lip information obtained from the Kinect

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
10
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(10 citation statements)
references
References 12 publications
0
10
0
Order By: Relevance
“…Also called as match pair in which both samples in a pair belong to the same identity10 Also called as non-match pair in which samples in a pair belong to the different identity…”
mentioning
confidence: 99%
“…Also called as match pair in which both samples in a pair belong to the same identity10 Also called as non-match pair in which samples in a pair belong to the different identity…”
mentioning
confidence: 99%
“…Palecek [9] proposed depth-based active appearance model (AAM) features and improved the accuracy over DCT. Wang et al [10] used the features based on 3D lip points obtained from Kinect. These methods is more suitable for real applications.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, depth cameras such as Microsoft Kinect has become available with a small cost. They were also used for multimodal speech recognition [8], [9], [10]. In this study, we aim to improve the performance of multimodal speech recognition using depth cameras.…”
Section: Introductionmentioning
confidence: 99%
“…In recent years, various AVSR modeling techniques [4,5,6,7,8,9,10] have been developed and yielded an impressive improvement over the ASR systems using only audio in an adverse environment. Conventional AVSR systems based on these approaches require highly specialized audio-visual (AV) data in both system training and evaluation.…”
Section: Introductionmentioning
confidence: 99%