Brand-related user posts on social networks are growing at a staggering rate, where users express their opinions about brands by sharing multimodal posts. However, while some posts become popular, others are ignored. In this paper, we present an approach for identifying what aspects of posts determine their popularity. We hypothesize that brandrelated posts may be popular due to several cues related to factual information, sentiment, vividness and entertainment parameters about the brand. We call the ensemble of cues engagement parameters. In our approach, we propose to use these parameters for predicting brand-related user post popularity. Experiments on a collection of fast food brand-related user posts crawled from Instagram show that: visual and textual features are complementary in predicting the popularity of a post; predicting popularity using our proposed engagement parameters is more accurate than predicting popularity directly from visual and textual features; and our proposed approach makes it possible to understand what drives post popularity in general as well as isolate the brand specific drivers.
Automatically generated tags and geotags hold great promise to improve access to video collections and online communities. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features.
This paper presents Blackthorn, an efficient interactive multimodal learning approach facilitating analysis of multimedia collections of up to 100 million items on a single highend workstation. Blackthorn features efficient data compression, feature selection, and optimizations to the interactive learning process. The Ratio-64 data representation introduced in this paper only costs tens of bytes per item yet preserves most of the visual and textual semantic information with good accuracy. The optimized interactive learning model scores the Ratio-64-compressed data directly, greatly reducing the computational requirements. The experiments compare Blackthorn with two baselines: Conventional relevance feedback, and relevance feedback using product quantization to compress the features. The results show that Blackthorn is up to 77.5× faster than the conventional relevance feedback alternative, while outperforming the baseline with respect to the relevance of results: It vastly outperforms the baseline on recall over time and reaches up to 108% of its precision. Compared to the product quantization variant, Blackthorn is just as fast, while producing more relevant results. On the full YFCC100M dataset, Blackthorn performs one complete interaction round in roughly 1 s while maintaining adequate relevance of results, thus opening multimedia collections comprising up to 100 million items to fully interactive learning-based analysis.
In this paper, we present a novel approach for automatic visual summarization of a geographic area that exploits user-contributed images and related explicit and implicit metadata collected from popular content-sharing websites. By means of this approach, we search for a limited number of representative but diverse images to represent the area within a certain radius around a specific location. Our approach is based on the random walk with restarts over a graph that models relations between images, visual features extracted from them, associated text, as well as the information on the uploader and commentators. In addition to introducing a novel edge weighting mechanism, we propose in this paper a simple but effective scheme for selecting the most representative and diverse set of images based on the information derived from the graph. We also present a novel evaluation protocol, which does not require input of human annotators, but only exploits the geographical coordinates accompanying the images in order to reflect conditions on image sets that must necessarily be fulfilled in order for users to find them representative and diverse. Experiments performed on a collection of Flickr images, captured around 207 locations in Paris, demonstrate the effectiveness of our approach.Index Terms-Automatic evaluation of visual summaries, graph-based models, image set diversity, image set representativeness, multimodal fusion, social media, visual summarization of geographic areas.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.