EmoNets: Multimodal deep learning approaches for emotion recognition in video

Kahou, Samira Ebrahimi; Bouthillier, Xavier; Lamblin, Pascal; Gülçehre, Çağlar; Michalski, Vincent; Konda, Kishore Reddy; Jean, Sébastien; Froumenty, Pierre; Dauphin, Yann N.; Boulanger-Lewandowski, Nicolas; Ferrari, Raul Chandias; Mirza, Mehdi; Warde-Farley, David; Courville, Aaron; Vincent, P.; Memisevic, Roland; Pal, Christopher; Bengio, Yoshua

doi:10.48550/arxiv.1503.01800

Cited by 4 publications

(4 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Multimodal networks have been proposed in both unsupervised ( [30]) and supervised ( [18], [22]) settings. Both [18] and [22] first train deep models on individual data modalities then use activations from these models to train a multimodal classifier.…”

Section: B Neural Networkmentioning

confidence: 99%

Deep learning for tactile understanding from visual and haptic data

Gao

Hendricks

Kuchenbecker

et al. 2016

2016 IEEE International Conference on Robotics and Automation (ICRA)

223

158

View full text Add to dashboard Cite

Robots which interact with the physical world will benefit from a fine-grained tactile understanding of objects and surfaces. Additionally, for certain tasks, robots may need to know the haptic properties of an object before touching it. To enable better tactile understanding for robots, we propose a method of classifying surfaces with haptic adjectives (e.g., compressible or smooth) from both visual and physical interaction data. Humans typically combine visual predictions and feedback from physical interactions to accurately predict haptic properties and interact with the world. Inspired by this cognitive pattern, we propose and explore a purely visual haptic prediction model. Purely visual models enable a robot to "feel" without physical interaction. Furthermore, we demonstrate that using both visual and physical interaction signals together yields more accurate haptic classification. Our models take advantage of recent advances in deep neural networks by employing a unified approach to learning features for physical interaction and visual observations. Even though we employ little domain specific knowledge, our model still achieves better results than methods based on hand-designed features.

show abstract

Section: B Neural Networkmentioning

confidence: 99%

Deep learning for tactile understanding from visual and haptic data

Gao

Hendricks

Kuchenbecker

et al. 2016

2016 IEEE International Conference on Robotics and Automation (ICRA)

223

158

View full text Add to dashboard Cite

show abstract

“…To the authors' knowledge, the only works that previously applied CNNs to expression data were that of Kahou et al [13,12] and Jung et al [11]. In [13,12], the authors developed a system for doing audio/visual emotion recognition for the Emotion Recognition in the Wild Challenge (EmotiW) [6,5] while in [11], the authors trained a network that incorporated both appearance and geometric features when doing recognition. However, one key point is that these works dealt with emotion recognition of video / image sequence data and therefore, actively incorporated temporal data when computing their predictions.…”

Section: Related Workmentioning

confidence: 99%

Do Deep Neural Networks Learn Facial Action Units When Doing Expression Recognition?

Khorrami

Paine

Huang

2015

2015 IEEE International Conference on Computer Vision Workshop (ICCVW)

237

120

View full text Add to dashboard Cite

Despite being the appearance-based classifier of choice in recent years, relatively few works have examined how much convolutional neural networks (CNNs) can improve performance on accepted expression recognition benchmarks and, more importantly, examine what it is they actually learn. In this work, not only do we show that CNNs can achieve strong performance, but we also introduce an approach to decipher which portions of the face influence the CNN's predictions. First, we train a zero-bias CNN on facial expression data and achieve, to our knowledge, state-of-the-art performance on two expression recognition benchmarks: the extended Cohn-Kanade (CK+) dataset and the Toronto Face Dataset (TFD). We then qualitatively analyze the network by visualizing the spatial patterns that maximally excite different neurons in the convolutional layers and show how they resemble Facial Action Units (FAUs). Finally, we use the FAU labels provided in the CK+ dataset to verify that the FAUs observed in our filter visualizations indeed align with the subject's facial movements.

show abstract

“…Therefore, the research focused strongly on face recognition, an active research area in recent years. In this category are the works proposed by Kahou et al 2015, Kollias et al 2015and Wei et al 2017. Unfortunately, these solutions lose generality as they are strongly focused on the primary detection of the face without considering other aspects that make up the image.…”

Section: Color and Emotion: From Computing To Deep Learningmentioning

confidence: 99%

Analysis of the use of color and its emotional relationship in visual creations based on experiences during the context of the COVID-19 pandemic

González-Martín¹,

Ortíz²,

Oviedo³

2022

Preprint

View full text Add to dashboard Cite

Color is a complex communicative element that helps us understand and evaluate our environment. At the level of artistic creation, this component influences both the formal aspects of the composition and the symbolic weight, directly affecting the construction and transmission of the message that you want to communicate, creating a specific emotional reaction. During the COVID-19 pandemic, people generated countless images transmitting this event's subjective experiences. Using the repository of images created in the Instagram account CAM (The COVID Art Museum), we propose a methodology to understand the use of color and its emotional relationship in this context. The process considers two stages in parallel that are then combined. First, emotions are extracted and classified from the CAM dataset images through a convolutional neural network. Second, we extract the colors and their harmonies through a clustering process. Once both processes are completed, we combine the results generating an expanded discussion on the usage of color, harmonies, and emotion. The results indicate that warm colors are prevalent in the sample, with a preference for analog compositions over complementary ones. The relationship between emotions and these compositions shows a trend in positive emotions, reinforced by the results of the algorithm a priori and the emotional relationship analysis of the attributes of color (hue, chroma, and lighting).

show abstract

EmoNets: Multimodal deep learning approaches for emotion recognition in video

Cited by 4 publications

References 0 publications

Deep learning for tactile understanding from visual and haptic data

Deep learning for tactile understanding from visual and haptic data

Do Deep Neural Networks Learn Facial Action Units When Doing Expression Recognition?

Analysis of the use of color and its emotional relationship in visual creations based on experiences during the context of the COVID-19 pandemic

Contact Info

Product

Resources

About