Motion estimation using audio and video fusion

Loh, Ai Poh; Guan, Fengying; Ge, Shuzhi Sam

doi:10.1109/icarcv.2004.1469293

Cited by 5 publications

(4 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Usually, rule based systems work fairly well but are very dependent on the application domain. Estimators have been sometimes used in multimodal fusion, as in [9], but they are typically used in feature level fusion systems. Finally, many classifiers have been tested for integrating multimodal information, like Support Vector Machines (SVMs) [10], neural networks [11] and Bayesian models [12].…”

Section: Background and Related Workmentioning

confidence: 99%

An extensible architecture for robust multimodal human-robot communication

Rossi

Leone

Fiore

et al. 2013

2013 IEEE/RSJ International Conference on Intelligent Robots and Systems

View full text Add to dashboard Cite

Human safety and effective human-robot communication are main concerns in HRI applications. In order to achieve such goals, a system should be very robust, allowing little chance for misunderstanding the user's commands. Moreover, the system should permit natural interaction reducing the time and the effort needed to achieve tasks. The main purpose of this work is to develop a general framework for flexible and multimodal human-robot communication. The proposed architecture should be easy to modify and expand, adding or modifying input channels and changing the multimodal fusion strategies. In this paper, we introduce our general approach and provide a case study with two modalities (gesture and speech).

show abstract

Section: Background and Related Workmentioning

confidence: 99%

An extensible architecture for robust multimodal human-robot communication

Rossi

Leone

Fiore

et al. 2013

2013 IEEE/RSJ International Conference on Intelligent Robots and Systems

View full text Add to dashboard Cite

show abstract

“…Loh et al [77] proposed a feature level fusion method for estimating the translational motion of a single speaker. They used different audio-visual features for estimating the position, velocity and acceleration of the single sound source.…”

Section: Where A(t) Is the Transition Model B(t) Is The Control Inpmentioning

confidence: 99%

Multimodal fusion for multimedia analysis: a survey

et al. 2010

View full text Add to dashboard Cite

This survey aims at providing multimedia researchers with a state-of-the-art overview of fusion strategies, which are used for combining multiple modalities in order to accomplish various multimedia analysis tasks. The existing literature on multimodal fusion research is presented through several classifications based on the fusion methodology and the level of fusion (feature, decision, and hybrid). The fusion methods are described from the perspective of the basic concept, advantages, weaknesses, and their usage in various analysis tasks as reported in the literature. Moreover, several distinctive issues that influence a multimodal fusion process such as, the use of correlation and independence, confidence level, contextual information, synchronization between different modalities, and the optimal modality selection are also highlighted. Finally, we present the open issues for further research in the area of multimodal fusion.

show abstract

“…As a crucial component of a social robot's sensing suite, a large part of research on social robots has focused on visual data analysis. It involves human/face detection and the fusion of stereo and infrared vision on board social robots with greater flexibility and robustness [10,14,20], for the purposes of attention focusing and for synthesizing more complex social interaction concepts, like comfort zones, into the robots.…”

Section: Introductionmentioning

confidence: 99%

Facial expression recognition and tracking for intelligent human-robot interaction

Yang

Lee

et al. 2008

Intel Serv Robotics

View full text Add to dashboard Cite

For effective interaction between humans and socially adept, intelligent service robots, a key capability required by this class of sociable robots is the successful interpretation of visual data. In addition to crucial techniques like human face detection and recognition, an important next step for enabling intelligence and empathy within social robots is that of emotion recognition. In this paper, an automated and interactive computer vision system is investigated for human facial expression recognition and tracking based on the facial structure features and movement information. Twenty facial features are adopted since they are more informative and prominent for reducing the ambiguity during classification. An unsupervised learning algorithm, distributed locally linear embedding (DLLE), is introduced to recover the inherent properties of scattered data lying on a manifold embedded in high-dimensional input facial images. The selected persondependent facial expression images in a video are classified using the DLLE. In addition, facial expression motion energy is introduced to describe the facial muscle's tension during the expressions for person-independent tracking for personindependent recognition. This method takes advantage of the optical flow which tracks the feature points' movement information. Finally, experimental results show that our approach is able to separate different expressions successfully.

show abstract

Motion estimation using audio and video fusion

Cited by 5 publications

References 15 publications

An extensible architecture for robust multimodal human-robot communication

An extensible architecture for robust multimodal human-robot communication

Multimodal fusion for multimedia analysis: a survey

Facial expression recognition and tracking for intelligent human-robot interaction

Contact Info

Product

Resources

About