This study emphasizes the need for standardized measurement tools for human robot interaction (HRI). If we are to make progress in this field then we must be able to compare the results from different studies. A literature review has been performed on the measurements of five key concepts in HRI: anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety. The results have been distilled into five consistent questionnaires using semantic differential scales. We report reliability and validity indicators based on several empirical studies that used these questionnaires. It is our hope that these questionnaires can be used by robot developers to monitor their progress. Psychologists are invited to further develop the questionnaires by adding new concepts, and to conduct further validations where it appears necessary.
No abstract
In this paper, we introduce a novel multimodal fashion search paradigm where e-commerce data is searched with a multimodal query composed of both an image and text. In this setting, the query image shows a fashion product that the user likes and the query text allows to change certain product attributes to fit the product to the user's desire. Multimodal search gives users the means to clearly express what they are looking for. This is in contrast to current e-commerce search mechanisms, which are cumbersome and often fail to grasp the customer's needs. Multimodal search requires intermodal representations of visual and textual fashion attributes which can be mixed and matched to form the user's desired product, and which have a mechanism to indicate when a visual and textual fashion attribute represent the same concept. With a neural network, we induce a common, multimodal space for visual and textual fashion attributes where their inner product measures their semantic similarity. We build a multimodal retrieval model which operates on the obtained intermodal representations and which ranks images based on their relevance to a multimodal query. We demonstrate that our model is able to retrieve images that both exhibit the necessary query image attributes and satisfy the query texts. Moreover, we show that our model substantially outperforms two state-of-the-art retrieval models adapted to multimodal fashion search.
Automatic linking of online content improves navigation possibilities for end users. We focus on linking content generated by users to other relevant sites. In particular, we study the problem of linking information between different usages of the same language, e.g., colloquial and formal idioms or the language of consumers versus the language of sellers. The challenge is that the same items are described using very distinct vocabularies. As a case study, we investigate a new task of linking textual Pinterest.com pins (colloquial) to online webshops (formal). Given this task, our key insight is that we can learn associations between formal and informal language by utilizing aligned data and probabilistic modeling. Specifically, we thoroughly evaluate three different modeling paradigms based on probabilistic topic modeling: monolingual latent Dirichlet allocation (LDA), bilingual LDA (BiLDA) and a novel multi-idiomatic LDA model (MiLDA). We compare these to the unigram model with Dirichlet prior. Our results for all three topic models reveal the usefulness of modeling the hidden thematic structure of the data through topics, as opposed to the linking model based solely on the standard unigram. Moreover, our proposed MiLDA model is able to deal with intrinsic multiidiomatic data by considering the shared vocabulary between the aligned document pairs. The proposed MiLDA obtains the largest stability (less variation with changes in parameters) and highest mean average precision scores in the linking task.
Inferring user interests on social media from text and images is addressed as a multi-class classification problem. We proposed approaches to infer user interest on Social media where often multi-modal data (text, image etc.) exists. We use user-generated data from Pinterest.com as a natural expression of users' interests. We consider each pin (image-text pair) as a category label that represents a broad user interest, since users collect images that they like on the social media platform and often assign a category label. This task is useful beyond Pinterest because most user-generated data on the Web is not necessarily readily categorized into interest labels. In addition to predicting users' interests, our main contribution is exploiting a multi-modal space composed of images and text. This is a natural approach since humans express their interests with a combination of modalities. Exploiting multi-modal spaces in this context has received little attention in the literature. We performed eleven experiments using the state-of-the-art image and textual representations, such as convolutional neural networks, word embeddings, and bags of visual and textual words. Our experimental results show that in fact jointly processing image and text increases the overall interest classification accuracy, when compared to uni-modal representations (i.e., using only text or using only images). Keywords-inferring user interests, user modeling, term frequencies, bag of words (BoW), convolutional neural networks (CNN), word embeddings.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.