Visual multimedia have become an inseparable part of our digital social lives, and they often capture moments tied with deep affections. Automated visual sentiment analysis tools can provide a means of extracting the rich feelings and latent dispositions embedded in these media. In this work, we explore how Convolutional Neural Networks (CNNs), a now de facto computational machine learning tool particularly in the area of Computer Vision, can be specifically applied to the task of visual sentiment prediction. We accomplish this through fine-tuning experiments using a state-of-the-art CNN and via rigorous architecture analysis, we present several modifications that lead to accuracy improvements over prior art on a dataset of images from a popular social media platform. We additionally present visualizations of local patterns that the network learned to associate with image sentiment for insight into how visual positivity (or negativity) is perceived by the model.
Deep learning algorithms base their success on building high learning capacity models with millions of parameters that are tuned in a data-driven fashion. These models are trained by processing millions of examples, so that the development of more accurate algorithms is usually limited by the throughput of the computing devices on which they are trained. In this work, we explore how the training of a state-of-the-art neural network for computer vision can be parallelized on a distributed GPU cluster. The effect of distributing the training process is addressed from two different points of view. First, the scalability of the task and its performance in the distributed setting are analyzed. Second, the impact of distributed training methods on the final accuracy of the models is studied.Peer ReviewedPostprint (published version
This paper explores the potential for using Brain Computer Interfaces (BCI) as a relevance feedback mechanism in contentbased image retrieval. Several experiments are performed using a rapid serial visual presentation (RSVP) of images at different rates (5Hz and 10Hz) on 8 users with different degrees of familiarization with BCI and the dataset. We compare the feedback from the BCI and mouse-based interfaces in a subset of TRECVid images, finding that, when users have limited time to annotate the images, both interfaces are comparable in performance. Comparing our best users in a retrieval task, we found that EEG-based relevance feedback can outperform mouse-based feedback. Categories and Subject Descriptors MOTIVATIONThe exponential growth of visual content and its huge diversity has motivated considerable research on how documents can be retrieved according to user intentions when formulating a query.Advances in image processing and computer vision have provided tools for a perceptual and semantic interpretation of both the query and the indexed content. This has allowed the development of retrieval systems capable of processing queries by example and concepts.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. The role of a human user during visual retrieval is critical, and his judgment about the correctness of the retrieved results can greatly speed up the search processes. This kind of relevance feedback has been demonstrated to significantly improve retrieval performance in image [10] and video [1] retrieval. Manually annotating images using a mouse, especially in a visual retrieval context can be tedious and mentally exhausting. In that such a scenario, EEG-based brain computer interfaces offer a potential solution as a mechanism to quickly annotate images. RELATED WORKEEG signals have been used for object detection in [2], where authors aim to detect airplanes in a dataset of satellite images from the city of London. The work in [3] expands the catalog of objects in very simple images, where the object on a black background occupies the whole image.EEG signals have been also used for image retrieval in [9], where authors used EEG relevance annotations to retrieve specific concepts in a complex dataset of keyframes from TRECVid 2005. However, while that work aimed at detecting concepts depicted by the whole image, we focus on the more challenging task of detecting a local object in a complex scenario. Another similar work [8], addresses the usage of EEG for image retrieval by formu...
This paper extends our previous work on the potential of EEGbased brain computer interfaces to segment salient objects in images. The proposed system analyzes the Event Related Potentials (ERP) generated by the rapid serial visual presentation of windows on the image. The detection of the P300 signal allows estimating a saliency map of the image, which is used to seed a semi-supervised object segmentation algorithm. Thanks to the new contributions presented in this work, the average Jaccard index was improved from 0.47 to 0.66 when processed in our publicly available dataset of images, object masks and captured EEG signals. This work also studies alternative architectures to the original one, the impact of object occupation in each image window, and a more robust evaluation based on statistical analysis and a weighted F-score.
Evaluating image retrieval systems in a quantitative way, for example by computing measures like mean average precision, allows for objective comparisons with a ground-truth. However, in cases where ground-truth is not available, the only alternative is to collect feedback from a user. us, qualitative assessments become important to be er understand how the system works. Visualizing the results could be, in some scenarios, the only way to evaluate the results obtained and also the only opportunity to identify that a system is failing. is necessitates developing a User Interface (UI) for a Content Based Image Retrieval (CBIR) system that allows visualization of results and improvement via capturing user relevance feedback. A well-designed UI facilitates understanding of the performance of the system, both in cases where it works well and perhaps more importantly those which highlight the need for improvement. Our open-source system implements three components to facilitate researchers to quickly develop these capabilities for their retrieval engine. We present: a web-based user interface to visualize retrieval results and collect user annotations; a server that simpli es connection with any underlying CBIR system; and a server that manages the search engine data. e so ware itself is described in a separate submission to the ACM MM Open Source So ware Competition.
Abstract. This work presents a browser that supports two strategies for video browsing: the navigation through visual hierarchies and the retrieval of similar images and objects. The input videos are firstly processed by a keyframe extractor to reduce the temporal redundancy and decrease the number of elements to consider. These generated keyframes are hierarchically clustered with the Hierachical Cellular Tree (HCT) algorithm, an indexing technique that also allows the creation of data structures suitable for browsing. Different clustering criteria are available, in the current implementation, based on four MPEG-7 visual descriptors computed at the global scale. The navigation can directly drive the user to find the video timestamps that best match the query or to a keyframe which is globally or locally similar in visual terms to the query. If this is the case, a visual search engine is also available to find other similar keyframes or regions, also based on MPEG-7 visual descriptors.
This paper presents a graphical environment for the annotation of still images that works both at the global and local scales. At the global scale, each image can be tagged with positive, negative and neutral labels referred to a semantic class from an ontology. These annotations can be used to train and evaluate an image classifier. A finer annotation at a local scale is also available for interactive segmentation of objects. This process is formulated as a selection of regions from a precomputed hierarchical partition called Binary Partition Tree. Three different semi-supervised methods have been presented and evaluated: bounding boxes, scribbles and hierarchical navigation. The implemented Java source code is published under a free software licensePostprint (published version
Metric Access Methods (MAMs) are indexing techniques which al-\ud low working in generic metric spaces. Therefore, MAMs are specially useful\ud for Content-Based Image Retrieval systems based on features which use non\ud Lp norms as similarity measures. MAMs naturally allow the design of image\ud browsers due to their inherent hierarchical structure. The Hierarchical Cellular\ud Tree (HCT), a MAM-based indexing technique, provides the starting point of\ud our work. In this paper, we describe some limitations detected in the original\ud formulation of the HCT and propose some modi cations to both the index\ud building and the search algorithm. First, the covering radius, which is de ned\ud as the distance from the representative to the furthest element in a node, may\ud not cover all the elements belonging to the node's subtree. Therefore, we pro-\ud pose to rede ne the covering radius as the distance from the representative\ud to the furthest element in the node's subtree. This new de nition is essen-\ud tial to guarantee a correct construction of the HCT. Second, the proposed\ud Progressive Query retrieval scheme can be redesigned to perform the nearest\ud neighbor operation in a more e cient way. We propose a new retrieval scheme\ud which takes advantage of the bene ts of the search algorithm used in the index\ud building. Furthermore, while the evaluation of the HCT in the original work\ud was only subjective, we propose an objective evaluation based on two aspects\ud which are crucial in any approximate search algorithm: the retrieval time and\ud the retrieval accuracy. Finally, we illustrate the usefulness of the proposal by\ud presenting some actual applications.Peer ReviewedPostprint (published version
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.