“…The decision-level fusion methods usually relied on simple voting rules (e.g., Dy et al [2010] and Gajsek et al [2010]), but more nuanced ways of decision making were also proposed. Some of these include metadecision trees [Wu and Liang 2011], cascading specialists [Kim and Lingenfelser 2010;], Kalman filters , Bayesian belief integration [Chanel et al 2011], and Markov decision networks [Krell et al 2013].…”
Section: Major Trends In MM Affect Detectorsmentioning
Affect detection is an important pattern recognition problem that has inspired researchers from several areas. The field is in need of a systematic review due to the recent influx of Multimodal (MM) affect detection systems that differ in several respects and sometimes yield incompatible results. This article provides such a survey via a quantitative review and meta-analysis of 90 peer-reviewed MM systems. The review indicated that the state of the art mainly consists of person-dependent models (62.2% of systems) that fuse audio and visual (55.6%) information to detect acted (52.2%) expressions of basic emotions and simple dimensions of arousal and valence (64.5%) with feature-(38.9%) and decision-level (35.6%) fusion techniques. However, there were also person-independent systems that considered additional modalities to detect nonbasic emotions and complex dimensions using model-level fusion techniques. The meta-analysis revealed that MM systems were consistently (85% of systems) more accurate than their best unimodal counterparts, with an average improvement of 9.83% (median of 6.60%). However, improvements were three times lower when systems were trained on natural (4.59%) versus acted data (12.7%). Importantly, MM accuracy could be accurately predicted (cross-validated R 2 of 0.803) from unimodal accuracies and two system-level factors. Theoretical and applied implications and recommendations are discussed.
ACM Reference Format:Sidney K. D'Mello and Jacqueline Kory. 2015. A review and meta-analysis of multimodal affect detection systems.
“…The decision-level fusion methods usually relied on simple voting rules (e.g., Dy et al [2010] and Gajsek et al [2010]), but more nuanced ways of decision making were also proposed. Some of these include metadecision trees [Wu and Liang 2011], cascading specialists [Kim and Lingenfelser 2010;], Kalman filters , Bayesian belief integration [Chanel et al 2011], and Markov decision networks [Krell et al 2013].…”
Section: Major Trends In MM Affect Detectorsmentioning
Affect detection is an important pattern recognition problem that has inspired researchers from several areas. The field is in need of a systematic review due to the recent influx of Multimodal (MM) affect detection systems that differ in several respects and sometimes yield incompatible results. This article provides such a survey via a quantitative review and meta-analysis of 90 peer-reviewed MM systems. The review indicated that the state of the art mainly consists of person-dependent models (62.2% of systems) that fuse audio and visual (55.6%) information to detect acted (52.2%) expressions of basic emotions and simple dimensions of arousal and valence (64.5%) with feature-(38.9%) and decision-level (35.6%) fusion techniques. However, there were also person-independent systems that considered additional modalities to detect nonbasic emotions and complex dimensions using model-level fusion techniques. The meta-analysis revealed that MM systems were consistently (85% of systems) more accurate than their best unimodal counterparts, with an average improvement of 9.83% (median of 6.60%). However, improvements were three times lower when systems were trained on natural (4.59%) versus acted data (12.7%). Importantly, MM accuracy could be accurately predicted (cross-validated R 2 of 0.803) from unimodal accuracies and two system-level factors. Theoretical and applied implications and recommendations are discussed.
ACM Reference Format:Sidney K. D'Mello and Jacqueline Kory. 2015. A review and meta-analysis of multimodal affect detection systems.
“…They have used Gaussian Matrix Models (GMM) to model each modalities and have used Bayesian classifier weight scheme and support vector machines to combine multiple modalities. Marc Lanze Ivan et al [2] developed a multimodal emotion recognition system that was trained using a spontaneous Filipino emotion database. The system could extract voice and facial feature and then use support vector machine to classify correct emotion label.…”
“…Similar to JAFFE, Spontaneous Filipino Emotion Database is a emotion database collected within single race [114]. It focused on multimodal emotion recognition system that is trained using a spontaneous Filipino emotion database.…”
First of all, I would like to express my sincere gratitude to my supervisor, Prof. E. Cambria. It is my great fortune to have the opportunity to pursue a Ph.D degree with his guidance and assistance. Without his support, I would not continue my study and come to the final. Next, I would like to thank Dr. Iti Chaturvedi and Dr. Soujanya Poria for their inspiring discussions and assistance during my PhD study. Specially thanks to Dr. Haiyun Peng and Yukun Ma, who are senior PhD students of Prof. Cambria, for helping me to start the exploration of sentiment analysis. This PhD was the most unforgettable experience of my life. I am really grateful to all my friends who cheered me up in difficult moments and shared my happiness when I made progress, including
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.