2001
DOI: 10.1006/dspr.2001.0397
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Fusion of Speech and Lip Information for Robust Speaker Identification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
34
0

Year Published

2005
2005
2011
2011

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 48 publications
(35 citation statements)
references
References 28 publications
0
34
0
Order By: Relevance
“…For audiovisual fusion, we adopted the score dispersion as an SCM approach in the baseline fusion scheme. Assuming K speakers, the fusion procedure is as follows [1], [2]. a) Generate the audio and video log-likelihood scores through individual classifiers for the input of AV fusion; …”
Section: Baseline Avsi System: Score-based Fusionmentioning
confidence: 99%
See 2 more Smart Citations
“…For audiovisual fusion, we adopted the score dispersion as an SCM approach in the baseline fusion scheme. Assuming K speakers, the fusion procedure is as follows [1], [2]. a) Generate the audio and video log-likelihood scores through individual classifiers for the input of AV fusion; …”
Section: Baseline Avsi System: Score-based Fusionmentioning
confidence: 99%
“…Multi-modal integration for audio-visual speaker identification (AVSI) is one of the robust approaches [1]- [5] in noisy environments, where speech signals have relatively high levels of distortion. The main issues concerning AVSI involve an integration structure and reliability decision-making.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Moreover, speaker identification by using lips information is performed in (13) (14) . If these two technologies are collaborated, a human surveillance system with a mobile robot can be achieved, as mentioned in (15) .…”
Section: Introductionmentioning
confidence: 99%
“…One popular approach is to fuse the scores obtained from modality-specific classifiers. For example, in [1][2] the scores from a lip recognizer are fused with those from a speaker recognizer, and in [3] a face classifier is combined with a voice classifier using a variety of combination rules. These types of systems, however, require multiple sensors, which tend to increase system costs and require extra cooperation from users, e.g.…”
Section: Introductionmentioning
confidence: 99%