ICARCV 2004 8th Control, Automation, Robotics and Vision Conference, 2004.
DOI: 10.1109/icarcv.2004.1469293
|View full text |Cite
|
Sign up to set email alerts
|

Motion estimation using audio and video fusion

Abstract: In this paper, motion estimation is proposed by fusing audio and video sensor data. The audio systom consists of three microphones arranged on a Y-shaped structure, mounted on a pan-tilt camera. The camera forms the video system. Together, the audio and video system enables the 3D position of the sound source to be estimated. Using the position estimates, a motion model, consisting of the trenslational velocity and acceleration of the source, is in turn estimated using a, Kalnian Filter. The motion model allow… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 15 publications
0
4
0
Order By: Relevance
“…Usually, rule based systems work fairly well but are very dependent on the application domain. Estimators have been sometimes used in multimodal fusion, as in [9], but they are typically used in feature level fusion systems. Finally, many classifiers have been tested for integrating multimodal information, like Support Vector Machines (SVMs) [10], neural networks [11] and Bayesian models [12].…”
Section: Background and Related Workmentioning
confidence: 99%
“…Usually, rule based systems work fairly well but are very dependent on the application domain. Estimators have been sometimes used in multimodal fusion, as in [9], but they are typically used in feature level fusion systems. Finally, many classifiers have been tested for integrating multimodal information, like Support Vector Machines (SVMs) [10], neural networks [11] and Bayesian models [12].…”
Section: Background and Related Workmentioning
confidence: 99%
“…Loh et al [77] proposed a feature level fusion method for estimating the translational motion of a single speaker. They used different audio-visual features for estimating the position, velocity and acceleration of the single sound source.…”
Section: Where A(t) Is the Transition Model B(t) Is The Control Inpmentioning
confidence: 99%
“…As a crucial component of a social robot's sensing suite, a large part of research on social robots has focused on visual data analysis. It involves human/face detection and the fusion of stereo and infrared vision on board social robots with greater flexibility and robustness [10,14,20], for the purposes of attention focusing and for synthesizing more complex social interaction concepts, like comfort zones, into the robots.…”
Section: Introductionmentioning
confidence: 99%