Abstract. The surge of opinionated on-line texts provides a wealth of information that can be exploited to analyze users' viewpoints and opinions on various topics. This article presents VODUM, an unsupervised Topic Model designed to jointly discover viewpoints, topics, and opinions in text. We hypothesize that partitioning topical words and viewpointspecific opinion words using part-of-speech helps to discriminate and identify viewpoints. Quantitative and qualitative experiments on the Bitterlemons collection show the performance of our model. It outperforms state-of-the-art baselines in generalizing data and identifying viewpoints. This result stresses how important topical and opinion words separation is, and how it impacts the accuracy of viewpoint identification.
We investigate in this paper the use of XML structure in multimedia retrieval, particularly in context-based image retrieval. We propose two methods to represent multimedia objects: the first one is based on an implicit use of textual and structural context of multimedia objects, whereas the second one is based on an explicit use of both sources. Experimental evaluation is carried out using the INEX MultimediaFragments Task 2006 and 2007. We show that there is a strong vocabulary relation between the query and the multimedia object representation, and that using XML structure improves significantly the effectiveness of multimedia retrieval.
We investigate in this paper information retrieval in microblogs exploiting different state-of-the-art features. Microbloggers, besides posting microblogs, search for fresh and relevant information related to their interests, by submitting a query to a microblog search engine. The majority of approaches that collect information from microblogs exploit features such as the recency of the microblog, the authority of his/her author. . . to improve the quality of their results. In this paper, we evaluated some of the state-of-theart features to determine those that discriminate relevant from irrelevant microblogs given an information need. Then, we used the selected features to learn models to determine their effectiveness in a microblog search task. We conducted a series of experiments using the dataset and topics of the TREC Microblog 2011 and 2012 tracks. Results show that content, hypertextuality, and recency are the best predictors of relevance. We also found that Naive Bayes was the most effective learning approach for this type of classification.
Several deep neural ranking models have been proposed in the recent IR literature. While their transferability to one target domain held by a dataset has been widely addressed using traditional domain adaptation strategies, the question of their cross-domain transferability is still under-studied. We study here in what extent neural ranking models catastrophically forget old knowledge acquired from previously observed domains after acquiring new knowledge, leading to performance decrease on those domains. Our experiments show that the effectiveness of neural IR ranking models is achieved at the cost of catastrophic forgetting and that a lifelong learning strategy using a cross-domain regularizer successfully mitigates the problem. Using an explanatory approach built on a regression model, we also show the effect of domain characteristics on the rise of catastrophic forgetting. We believe that the obtained results can be useful for both theoretical and practical future work in neural IR.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.