Combining video streaming and online retailing (V2R) has been a growing trend recently. In this paper, we provide practitioners and researchers in multimedia with a cloud-based platform named Hysia for easy development and deployment of V2R applications. The system consists of: 1) a back-end infrastructure providing optimized V2R related services including data engine, model repository, model serving and content matching; and 2) an application layer which enables rapid V2R application prototyping. Hysia addresses industry and academic needs in large-scale multimedia by: 1) seamlessly integrating state-of-the-art libraries including NVIDIA video SDK, Facebook faiss, and gRPC; 2) efficiently utilizing GPU computation; and 3) allowing developers to bind new models easily to meet the rapidly changing deep learning (DL) techniques. On top of that, we implement an orchestrator for further optimizing DL model serving performance. Hysia has been released as an open source project on GitHub, and attracted considerable attention. We have published Hysia to DockerHub as an official image for seamless integration and deployment in current cloud environments.
Given the massive market of advertising and the sharply increasing online multimedia content (such as videos), it is now fashionable to promote advertisements (ads) together with the multimedia content. However, manually finding relevant ads to match the provided content is labor-intensive, and hence some automatic advertising techniques are developed. Since ads are usually hard to understand only according to its visual appearance due to the contained visual metaphor, some other modalities, such as the contained texts, should be exploited for understanding. To further improve user experience, it is necessary to understand both the ads' topic and sentiment. This motivates us to develop a novel deep multimodal multitask framework that integrates multiple modalities to achieve effective topic and sentiment prediction simultaneously for ads understanding. In particular, in our framework termed Deep 2 Ad, we first extract multimodal information from ads and learn high-level and comparable representations. The visual metaphor of the ad is decoded in an unsupervised manner. The obtained representations are then fed into the proposed hierarchical multimodal attention modules to learn task-specific representations for final prediction. A multitask loss function is also designed to jointly train both the topic and sentiment prediction models in an end-to-end manner, where bottom-layer parameters are shared to alleviate over-fitting. We conduct extensive experiments on a large-scale advertisement dataset and achieve state-of-the-art performance for both prediction tasks. The obtained results could be utilized as a benchmark for ads understanding.
Given the massive market of advertising and the sharply increasing online multimedia content (such as videos), it is now fashionable to promote advertisements (ads) together with the multimedia content. It is exhausted to find relevant ads to match the provided content manually, and hence, some automatic advertising techniques are developed. Since ads are usually hard to understand only according to its visual appearance due to the contained visual metaphor, some other modalities, such as the contained texts, should be exploited for understanding. To further improve user experience, it is necessary to understand both the topic and sentiment of the ads. This motivates us to develop a novel deep multimodal multitask framework to integrate multiple modalities to achieve effective topic and sentiment prediction simultaneously for ads understanding. In particular, our model first extracts multimodal information from ads and learn high-level and comparable representations. The visual metaphor of the ad is decoded in an unsupervised manner. The obtained representations are then fed into the proposed hierarchical multimodal attention modules to learn task-specific representations for final prediction. A multitask loss function is also designed to train both the topic and sentiment prediction models jointly in an end-to-end manner. We conduct extensive experiments on the latest and large advertisement dataset and achieve state-of-the-art performance for both prediction tasks. The obtained results could be utilized as a benchmark for ads understanding.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.