Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662)
DOI: 10.1109/cvpr.2000.854730
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal speaker detection using error feedback dynamic Bayesian networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
41
0

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 37 publications
(42 citation statements)
references
References 11 publications
1
41
0
Order By: Relevance
“…There are a few methods proposed to solve the audio-visual source localization problem for multimedia [32], [33], but they mostly assume that the sound source is a human speaker's mouth. Thus, they cannot be applied to general multimedia content that contain non-speech audio signal.…”
Section: Foveated Video Coding Based On Audio-visual Focus Of Atmentioning
confidence: 99%
“…There are a few methods proposed to solve the audio-visual source localization problem for multimedia [32], [33], but they mostly assume that the sound source is a human speaker's mouth. Thus, they cannot be applied to general multimedia content that contain non-speech audio signal.…”
Section: Foveated Video Coding Based On Audio-visual Focus Of Atmentioning
confidence: 99%
“…Localizing and tracking speakers in enclosed spaces using AV information has increasingly attracted attention in signal processing and computer vision [36,17,7,34,13,43,48,1,3,6,5], given the complementary characteristics of each modality. Broadly speaking, the differences among existing works arise from the overall goal (tracking single vs. multiple speakers), the specific detection/tracking framework, and the AV sensor configuration.…”
Section: Related Workmentioning
confidence: 99%
“…Broadly speaking, the differences among existing works arise from the overall goal (tracking single vs. multiple speakers), the specific detection/tracking framework, and the AV sensor configuration. Much work has concentrated on the single-speaker case, assuming either single-person scenes [7,34,1], or multiperson scenes where only the location of the current speaker needs to be tracked [36,17,13,43,48,3]. Many of these works used simple sensor configurations (e.g.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Most existing methods for speaker detection are realized by combining techniques of sound localization via a microphone array and human tracking via background subtraction by using coupled Hidden Markov Models (HMMs) or Dynamic Bayesian Networks (DBNs) [11,2]. However, because of the spatial resolution of the microphone array, these methods can become ineffective in situations where speakers are physically close to each other.…”
Section: Introductionmentioning
confidence: 99%