Toward emotion indexing of multimedia excerpts

Paleari, Marco; Huet, Benoît

doi:10.1109/cbmi.2008.4564978

Cited by 38 publications

(24 citation statements)

References 12 publications

(16 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Additionally, we study the applicability of several fusion schemes to further improve upon the results obtained with the individual modalities. Our results show that both unimodal approaches as well as the proposed combined system compare favorably with state-of-the-art techniques from the literature [9,12,13,14].…”

Section: Introductionmentioning

confidence: 63%

“…. Confusion matrices for all 5 folds of our cross validation procedure generated using the presented audio sub-system Video SAMMI framework [12,24] 28.0 / Video sub-system [13] 37.0 / LBPs+HMMs [14] 37 …”

Section: Audio-video Fusionmentioning

confidence: 99%

“…Audio Video HMM [14] 56.3 / SAMMI framework [12,24] 67.0 / Async. feature fusion [13] 71 The differences between the weighted sum-rule and weighted product-rule are minor, with the highest UW recall of 77.5% achieved by the weighted product-rule fusion procedure.…”

Section: Fusionmentioning

confidence: 99%

See 2 more Smart Citations

Towards Efficient Multi-Modal Emotion Recognition

Dobrišek

Gajsek

Pavešić

et al. 2013

International Journal of Advanced Robotic Systems

View full text Add to dashboard Cite

The paper presents a multi-modal emotion recognition system exploiting audio and video (i.e., facial expression) information. The system first processes both sources of information individually to produce corresponding matching scores and then combines the computed matching scores to obtain a classification decision. For the video part of the system, a novel approach to emotion recognition, relying on image-set matching, is developed. The proposed approach avoids the need for detecting and tracking specific facial landmarks throughout the given video sequence, which represents a common source of error in video-based emotion recognition systems, and, therefore, adds robustness to the video processing chain. The audio part of the system, on the other hand, relies on utterance-specific Gaussian Mixture Models (GMMs) adapted from a Universal Background Model (UBM) via the maximum a posteriori probability (MAP) estimation. It improves upon the standard UBM-MAP procedure by exploiting gender information when building the utterance-specific GMMs, thus ensuring enhanced emotion recognition performance. Both the uni-modal parts as well as the combined system are assessed on the challenging multi-modal eNTERFACEʹ05 corpus with highly encouraging results. The developed system represents a feasible solution to emotion recognition that can easily be integrated into various systems, such as humanoid robots, smart surveillance systems and alike.

show abstract

Section: Introductionmentioning

confidence: 63%

Section: Audio-video Fusionmentioning

confidence: 99%

See 1 more Smart Citation

Towards Efficient Multi-Modal Emotion Recognition

Dobrišek

Gajsek

Pavešić

et al. 2013

International Journal of Advanced Robotic Systems

View full text Add to dashboard Cite

show abstract

“…The subtle exchange of glances between Elizabeth and her father would be readily apparent to most human observers, but it is unlikely that a computer processing a video of the scene would be able to recognise their meaning. Furthermore, while the double-entendre in Mr Bennett's remark would be clear to most human listeners, algorithmic recoginition of this or other modes of speech are in their infancy (Paleari & Huet, 2008). Other research communities are developing means to communicate such semantic information (whether computed or manually generated) in ways that are able to transcend the original context of the information.This work-originating from Knowledge Representation, but more popularly known as the Semantic Web-has provided languages such as the Resource Description Framework (RDF) (Beckett, 2004) and Web Ontology Language (OWL) (Dean & Schreiber, 2004) which can be used to express concepts in such a way that "this picture has many buildings" may also imply that "it is a cityscape", and "it contains man-made objects."…”

Section: Introductionmentioning

confidence: 99%

What Are You Trying to Say? Format-Independent Semantic-Aware Streaming and Delivery

Thomas-Kerr¹,

Burnett²,

Ritz³

2011

Recent Advances on Video Coding

View full text Add to dashboard Cite

“…Paleari et al [69] carried out both decision and feature-level fusion. They experimented with the eNTERFACE dataset and showed that decision-level fusion outperformed feature-level fusion.…”

Section: Multimodal Fusionmentioning

confidence: 99%

Towards an intelligent framework for multimodal affective data analysis

et al. 2015

View full text Add to dashboard Cite

An increasingly large amount of multimodal content is posted on social media websites such as YouTube and Facebook everyday. In order to cope with the growth of such so much multimodal data, there is an urgent need to develop an intelligent multi-modal analysis framework that can effectively extract information from multiple modalities. In this paper, we propose a novel multimodal information extraction agent, which infers and aggregates the semantic and affective information associated with user-generated multimodal data in contexts such as e-learning, e-health, automatic video content tagging and human-computer interaction. In particular, the developed intelligent agent adopts an ensemble feature extraction approach by exploiting the joint use of tri-modal (text, audio and video) features to enhance the multimodal information extraction process. In preliminary experiments using the eNTERFACE dataset, our proposed multi-modal system is shown to achieve an accuracy of 87.95%, outperforming the best state-of-the-art system by more than 10%, or in relative terms, a 56% reduction in error rate

show abstract

Toward emotion indexing of multimedia excerpts

Cited by 38 publications

References 12 publications

Towards Efficient Multi-Modal Emotion Recognition

Towards Efficient Multi-Modal Emotion Recognition

What Are You Trying to Say? Format-Independent Semantic-Aware Streaming and Delivery

Towards an intelligent framework for multimodal affective data analysis

Contact Info

Product

Resources

About