2006
DOI: 10.1007/11922162_3
|View full text |Cite
|
Sign up to set email alerts
|

DBN Based Models for Audio-Visual Speech Analysis and Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2007
2007
2015
2015

Publication Types

Select...
4

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 7 publications
0
5
0
Order By: Relevance
“…DBN has been applied in different areas such as audiovisual, speech, gesture recognition [2], [3], [4], medical diagnosis [5], and stock price forecasting [6].…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…DBN has been applied in different areas such as audiovisual, speech, gesture recognition [2], [3], [4], medical diagnosis [5], and stock price forecasting [6].…”
Section: Methodsmentioning
confidence: 99%
“…In the literature there are several works that use DBN to model classification tasks including speech recognition [2], [3], gesture recognition [4], medical prognostic [5], forecasting [6] among others. The specialty of this paper is next place prediction.…”
Section: Introductionmentioning
confidence: 99%
“…As a comparison, word recognition results of tri-phone HMM and single stream DBN (SS-DBN) model are given in Table 1 too, where SS-DBN is given in [9], For Multi-stream HMM (implemented as product HMM, with four audio and four video HMM states) and a given SNRs (0dB to 30dB), stream exponent of audio stream a λ is varied from 0 to 1 in step of 0.05, and the value of the stream exponent that maximized the word accuracy is chosen. Audio stream exponents in different SNRs are given in Table 1 …”
Section: Experiments and Evaluationmentioning
confidence: 99%
“…2, which starts with the detection and tracking of speaker's face [9]. Since the mouth is the most important speech organs, Bayesian Tangent Shape Model (BTSM) algorithm is used to label automatically the feature point of speaker's lip [8], for every image, 20 feature points include outer contour and inner contour of the mouth are achieved, which is given in Fig.…”
Section: Audio and Visual Feature Extractionmentioning
confidence: 99%
“…Visual feature extraction is given in Fig. 4, which starts with the detection and tracking of the speaker's face (Ravyse et al, 2006), Since the mouth is the most important speech organs, the contour of the lips is obtained through the Bayesian Tangent Shape Model (BTSM) (Zhou et al, 2003), for each image, 20 profile points include outer contour and inner contour of the mouth are obtained, which is given in Fig. 5.…”
Section: Audio and Visual Feature Extractionmentioning
confidence: 99%