Kenzo Sakiyama scite author profile

Text from titles and audio transcriptions, image thumbnails, number of likes, dislikes, and views are examples of available data in a YouTube video. Despite the variability, most standard Deep Learning models use only one type of data. Moreover, the simultaneous use of multiple data sources for such problems is still rare. To shed light on these problems, we empirically evaluate eight different multimodal fusion operations using embeddings extracted from image thumbnails and video titles of YouTube videos using standard Deep Learning models, ResNet-based SE-Net for image feature extraction, and BERT to NLP. Experimental results show that simple operations such as sum or subtract embeddings can improve the accuracy of models. The multimodal fusion operations in this dataset achieved 81.3% accuracy, outperforming the unimodal models by 3.86% (text) and 5.79% (video).

show abstract

Twitter breaking news detector in the 2018 Brazilian presidential election using word embeddings and convolutional neural networks

Sakiyama

Silva

Matsubara

2019

View full text Add to dashboard Cite

Multimodal Fusion Based on Arithmetic Operations and Attention Mechanisms

RODRIGUES¹,

Sakiyama²,

Matsubara³

et al. 2022

SSRN Journal

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kenzo Sakiyama

BERT for Stock Market Sentiment Analysis

Successful Youtube video identification using multimodal deep learning

Twitter breaking news detector in the 2018 Brazilian presidential election using word embeddings and convolutional neural networks

Multimodal Fusion Based on Arithmetic Operations and Attention Mechanisms

Contact Info

Product

Resources

About