ABSTRACT. A keyword query is the representation of the information need of a user, and is the result of a complex cognitive process which often results in under-specification. We propose an unsupervised method namely Latent Concept Modeling (LCM) for mining and modeling latent search concepts in order to recreate the conceptual view of the original information need. We use Latent Dirichlet Allocation (LDA) to exhibit highly-specific query-related topics from pseudo-relevant feedback documents. We define these topics as the latent concepts of the user query. We perform a thorough evaluation of our approach over two large ad-hoc TREC collections. Our findings reveal that the proposed method accurately models latent concepts, while being very effective in a query expansion retrieval setting.RÉSUMÉ. Une requête est la représentation du besoin d'information d'un utilisateur, et est le résultat d'un processus cognitif complexe qui mène souvent à un mauvais choix de mots-clés. Nous proposons une méthode non supervisée pour la modélisation de concepts implicites d'une requête, dans le but de recréer la représentation conceptuelle du besoin d'information initial. Nous utilisons l'allocation de Dirichlet latente (LDA) pour détecter les concepts implicites de la requête en utilisant des documents pseudo-pertinents. Nous évaluons cette méthode en profondeur en utilisant deux collections de test de TREC. Nous trouvons notamment que notre approche permet de modéliser précisément les concepts implicites de la requête, tout en obtenant de bonnes performances dans le cadre d'une recherche de documents.
Modern Information Retrieval (IR) systems have become more and more complex, involving a large number of parameters. For example, a system may choose from a set of possible retrieval models (BM25, language model, etc.), or various query expansion parameters, whose values greatly in uence the overall retrieval effectiveness. Traditionally, these parameters are set at a system level based on training queries, and the same parameters are then used for di erent queries. We observe that it may not be easy to set all these parameters separately, since they can be dependent. In addition, a global setting for all queries may not best t all individual queries with di erent characteristics. The parameters should be set according to these characteristics. In this article, we propose a novel approach to tackle this problem by dealing with the entire system con gurations (i.e., a set of parameters representing a n IR system b ehaviour) instead of selecting a single parameter at a time. The selection of the best con guration i s c ast a s a p roblem o f r anking di erent possible con gurations given a query. We apply learning-to-rank approaches for this task. We exploit both the query features and the system con guration f eatures i n t he l earning-to-rank m ethod s o t hat the selection of con guration i s q uery d ependent. T he e xperiments w e c onducted o n f our T REC a d h oc collections show that this approach can signi cantly outperform the traditional m ethod t o tune system conguration g lobally ( i.e., g rid s earch) a nd l eads t o h igher e ectiveness th an th e to p pe rforming sy stems of the TREC tracks. We also perform an ablation analysis on the impact of di erent f eatures o n t he model learning capability and show that query expansion features are among the most important for adaptive systems.The study presented in this article is built on the results and conclusions of the previous descriptive analysis studies but moves a step further by performing a predictive analysis: We investigate how system parameters can be set to t a given query, i.e., a query-dependent setting of system parameters. We assume that some parameters of the system can be set on the y at querying time, and a retrieval system allows us to set di erent values for the parameters easily. This is indeed the case for most IR systems nowadays. Retrieval platforms such as Terrier 4 [61], Lemur 5 [70], or Lucene 6 [53] allow us to set parameters for the retrieval step. For example, one may choose between several retrieval models (e.g., BM25, language models), di erent query expansion schemes, and so on. We target this group of parameters that can be set at query time. In contrast, we assume that an IR system has already built an index that cannot be changed easily. For example, it would be di cult to choose between di erent stemmers at query time, unless we construct several indexes using di erent stemmers. We exclude these parameters that cannot be set at query time in this study.The problem we tackle in this article is query-dependent param...
Location-based social networks (LBSNs), such as Foursquare, fostered the emergence of new tasks such as recommending venues a user might wish to visit. In the literature, recommending venues has typically been addressed using usercentric recommendation approaches relying on collaborative filtering techniques. Such approaches not only require many users with detailed profiles to be effective, but they also cannot recommend venues to users who are not actually members of the LBSN. In contrast, in this paper, we introduce a venue-centric yet personalised probabilistic approach that suggests personalised and popular venues for users to visit in the near future. In our approach, we probabilistically incorporate two components, a popularity component for predicting the popularity of a venue at a given point in time, as estimated from the attendance of the venue in the LBSN (i.e. number of check-ins), and a personalisation component for identifying its interestingness with respect to the estimated preferences of the user. The popularity of each venue is predicted using time series forecasting models that are trained on the recent attendance trends of the venue, while the users' interests are modelled from the entity pages that they like on Facebook. Using three major cities, we conduct a user study to evaluate the effectiveness of the two components of our approach in suggesting venues for different types of users at different times of the day. Our experimental results show that an approach that combines the popularity and personalisation components is able to consistently outperform the recommendation service of the leading Foursquare LBSN. We also find that combining popularity and personalisation is effective for both new visitors and residents, while former visitors prefer popular venues.
Suggesting venues to a user in a given geographic context is an emerging task that is currently attracting a lot of attention. Existing studies in the literature consist of approaches that rank candidate venues based on different features of the venues and the user, which either focus on modelling the preferences of the user or the quality of the venue. However, while providing insightful results and conclusions, none of these studies have explored the relative effectiveness of these different features. In this paper, we explore a variety of user-dependent and venue-dependent features and apply state-of-the-art learning to rank approaches to the problem of contextual suggestion in order to find what makes a venue relevant for a given context. Using the test collection of the TREC 2013 Contextual Suggestion track, we perform a number of experiments to evaluate our approach. Our results suggest that a learning to rank technique can significantly outperform a Language Modelling baseline that models the positive and negative preferences of the user. Moreover, despite the fact that the contextual suggestion task is a personalisation task (i.e. providing the user with personalised suggestions of venues), we surprisingly find that userdependent features are less effective than venue-dependent features for estimating the relevance of a suggestion.
OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. This is an author-deposited version published in : http://oatao.univ-toulouse.fr/ Eprints ID : 18777The contribution was presented at CIKM 2016 :http://cikm2016.cs.iupui.edu/ ABSTRACTInformation Retrieval (IR) systems heavily rely on a large number of parameters, such as the retrieval model or various query expansion parameters, whose values greatly influence the overall retrieval effectiveness. However, setting all these parameters individually can often be a tedious task, since they can all affect one another, while also vary for different queries. We propose to tackle this problem by dealing with entire system configurations (i.e. a set of parameters representing an IR system) instead of single parameters, and to apply state-of-the-art Learning to Rank techniques to select the most appropriate configuration for a given query. The experiments we conducted on two TREC AdHoc collections show that this approach is feasible and significantly outperforms the traditional way to configure a system, as well as the top performing systems of the TREC tracks. We also show an analysis on the impact of different features on the model's learning capability.
In this paper, we study the emerging Information Retrieval (IR) task of contextual suggestion in location-based social networks. The aim of this task is to make personalised recommendations of venues for entertainments or activities whilst visiting a city, by appropriately representing the context of the user, such as their location and personal interests. Instead of only representing the specific low-level interests of a user, our approach is driven by estimates of the high-level categories of venues that the user may be interested in. Moreover, we argue that an effective model for contextual suggestion should not only promote the categories that the user is interested in, but it should also be capable of eliminating redundancy by diversifying the recommended venues in the sense that they should cover various categories of interest to the given user. Therefore, we adapt web search result diversification approaches to the task of contextual suggestion. For categorising the venues, we use the category classifications employed by location-based social networks such as FourSquare, urban guides such as Yelp, and a large collection of web pages, the ClueWeb12 corpus, to build a textual classifier that is capable of predicting the category distribution for a certain venue given its web page. We thoroughly evaluate our approach using the TREC 2013 Contextual Suggestion track. We conduct a number of experiments where we consider venues from the closed environments of both FourSquare and Yelp, and the general web using the ClueWeb12 corpus. Our empirical results suggest that category diversification consistently improves the effectiveness of the recommendation model over a reasonable baseline that only considers the similarity between the user's profile and venue. The results also give insights on the effectiveness of our approach with different types of users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.