We study a new task, proactive information retrieval by combining implicit relevance feedback and collaborative filtering. We have constructed a controlled experimental setting, a prototype application, in which the users try to find interesting scientific articles by browsing their titles. Implicit feedback is inferred from eye movement signals, with discriminative hidden Markov models estimated from existing data in which explicit relevance feedback is available. Collaborative filtering is carried out using the User Rating Profile model, a state-of-the-art probabilistic latent variable model, computed using Markov Chain Monte Carlo techniques. For new document titles the prediction accuracy with eye movements, collaborative filtering, and their combination was significantly better than by chance. The best prediction accuracy still leaves room for improvement but shows that proactive information retrieval and combination of many sources of relevance feedback is feasible.
We tackle the problem of new users or documents in collaborative filtering. Generalization over users by grouping them into user groups is beneficial when a rating is to be predicted for a relatively new document having only few observed ratings. Analogously, generalization over documents improves predictions in the case of new users. We show that if either users and documents or both are new, two-way generalization becomes necessary. We demonstrate the benefits of grouping of users, grouping of documents, and two-way grouping, with artificial data and in two case studies with real data. We have introduced a probabilistic latent grouping model for predicting the relevance of a document to a user. The model assumes a latent group structure for both users and items. We compare the model against a state-of-the-art method, the User Rating Profile model, where only the users have a latent group structure. We compute the posterior of both models by Gibbs sampling. The Two-Way Model predicts relevance more accurately when the target consists of both new documents and new users. The reason is that generalization over documents becomes beneficial for new documents and at the same time generalization over users is needed for new users.
Digitalization of content and exponential growth of Internet and electronic commerce are changing the media industry. The availability of structured content enables new ways to produce and deliver information. This paper explains the role of semantic metadata in developing content for an adaptive news service in the SmartPush -project. In SmartPush, news content is categorized using semi-automatic tools and pre-defined vocabularies. Metadata enhanced content is then matched against user profiles to provide customers with a personalized news service. After providing the personalized news to the customer, SmartPush system adapts the personalization based on user feedback. This paper discusses the requirements of personalized content services and challenges in an approach based on structured metadata. We describe how supporting ontologies for the content were developed and maintained and what kinds of tools were developed to support the structured metadata creation. We also present some results of the pilot phase of the project and introduce some of the issues observed during the system implementation and in the performed field trial.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.