With the rapid growth of social Web applications such as Twitter and online advertisements, the task of understanding short texts is becoming more and more important. Most traditional text mining techniques are designed to handle long text documents. For short text messages, many of the existing techniques are not effective due to the sparseness of text representations. To understand short messages, we observe that it is often possible to find topically related long texts, which can be utilized as the auxiliary data when mining the target short texts data. In this article, we present a novel approach to cluster short text messages via transfer learning from auxiliary long text data. We show that while some previous works for enhancing short text clustering with related long texts exist, most of them ignore the semantic and topical inconsistencies between the target and auxiliary data and may hurt the clustering performance on the short texts. To accommodate the possible inconsistencies between source and target data, we propose a novel topic model -Dual Latent Dirichlet Allocation (DLDA) model, which jointly learns two sets of topics on short and long texts and couples the topic parameters to cope with the potential inconsistencies between data sets. We demonstrate through large-scale clustering experiments on both advertisements and Twitter data that we can obtain superior performance over several state-of-art techniques for clustering short text documents.
A central goal of collaborative filtering (CF) is to rank items by their utilities with respect to individual users in order to make personalized recommendations. Traditionally, this is often formulated as a rating prediction problem. However, it is more desirable for CF algorithms to address the ranking problem directly without going through an extra rating prediction step. In this paper, we propose the probabilistic latent preference analysis (pLPA) model for ranking predictions by directly modeling user preferences with respect to a set of items rather than the rating scores on individual items. From a user's observed ratings, we extract his preferences in the form of pairwise comparisons of items which are modeled by a mixture distribution based on BradleyTerry model. An EM algorithm for fitting the corresponding latent class model as well as a method for predicting the optimal ranking are described. Experimental results on real world data sets demonstrated the superiority of the proposed method over several existing CF algorithms based on rating predictions in terms of ranking performance measure NDCG.
No abstract
In this paper, we describe our solutions to the weekly recommendation track and social network track of the CAMRA 2010 challenge. The key challenge in the weekly recommendation track is designing models that can cope with time dependent user or item characteristics. Toward this goal, we compared two general approaches, one is a data weighting approach, the other is a time-aware modeling approach. Both approaches can be implemented by extending either the well known neighborhood model or the matrix factorization. For the social network track, we developed and compared two extensions of the matrix factorization models for incorporating the social network structure, namely collective matrix factorization(CMF) and network regularized matrix factorization(NRMF). Experimental results shows that the use of temporal information can lead to significant improvement on the weekly recommendation track whereas for the social network track, the NRMF could lead to minor improvement by combining the social network with the rating data.
Most existing collaborative filtering models only consider the use of user feedback (e.g., ratings) and meta data (e.g., content, demographics). However, in most real world recommender systems, context information, such as time and social networks, are also very important factors that could be considered in order to produce more accurate recommendations. In this work, we address several challenges for the context aware movie recommendation tasks in CAMRa 2010: (1) how to combine multiple heterogeneous forms of user feedback? (2) how to cope with dynamic user and item characteristics? (3) how to capture and utilize social connections among users? For the first challenge, we propose a novel ranking based matrix factorization model to aggregate explicit and implicit user feedback. For the second challenge, we extend this model to a sequential matrix factorization model to enable time-aware parametrization. Finally, we introduce a network regularization function to constrain user parameters based on social connections. To the best of our knowledge, this is the first study that investigates the collective modeling of social and temporal dynamics. Experiments on the CAMRa 2010 dataset demonstrated clear improvements over many baselines. ACM Reference Format:Liu, N. N., He, L., and Zhao, M. 2013. Social temporal collaborative ranking for context aware movie recommendation.
Data sparsity is a major problem for collaborative filtering (CF) techniques in recommender systems, especially for new users and items. We observe that, while our target data are sparse for CF systems, related and relatively dense auxiliary data may already exist in some other more mature application domains. In this paper, we address the data sparsity problem in a target domain by transferring knowledge about both users and items from auxiliary data sources. We observe that in different domains the user feedbacks are often heterogeneous such as ratings vs. clicks. Our solution is to integrate both user and item knowledge in auxiliary data sources through a principled matrix-based transfer learning framework that takes into account the data heterogeneity. In particular, we discover the principle coordinates of both users and items in the auxiliary data matrices, and transfer them to the target domain in order to reduce the effect of data sparsity. We describe our method, which is known as coordinate system transfer or CST, and demonstrate its effectiveness in alleviating the data sparsity problem in collaborative filtering. We show that our proposed method can significantly outperform several state-of-the-art solutions for this problem.
Recommender systems have to deal with the cold start problem as new users and/or items are always present. Rating elicitation is a common approach for handling cold start. However, there still lacks a principled model for guiding how to select the most useful ratings. In this paper, we propose a principled approach to identify representative users and items using representative-based matrix factorization. Not only do we show that the selected representatives are superior to other competing methods in terms of achieving good balance between coverage and diversity, but we also demonstrate that ratings on the selected representatives are much more useful for making recommendations (about 10% better than competing methods). In addition to illustrating how representatives help solve the cold start problem, we also argue that the problem of finding representatives itself is an important problem that would deserve further investigations, for both its practical values and technical challenges.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.