Latent topic feedback for information retrieval

Andrzejewski, David; Buttler, David

doi:10.1145/2020408.2020503

Cited by 61 publications

(19 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Probabilistic topic modeling (and especially Latent Dirichlet Allocation) for information retrieval has been widely used recently in several ways (Andrzejewski, Buttler, 2011;Lu et al, 2011;Park, Ramamohanarao, 2009;Wei, Croft, 2006;Yi, Allan, 2009) and all studies reported improvements in document retrieval effectiveness. The main idea is to build a static topic model (using either LSA, pLSA, or LDA) of the collection, which will never be further updated, and to smooth the document language model by incorporating probabilities of words that belong to some topics matching the query (Lu et al, 2011;Park, Ramamohanarao, 2009;Wei, Croft, 2006;Yi, Allan, 2009).…”

Section: Related Workmentioning

confidence: 99%

Accurate and effective latent concept modeling for ad hoc information retrieval

Deveaud¹,

SanJuan²,

Bellot³

2014

Document numérique

398

253

View full text Add to dashboard Cite

ABSTRACT. A keyword query is the representation of the information need of a user, and is the result of a complex cognitive process which often results in under-specification. We propose an unsupervised method namely Latent Concept Modeling (LCM) for mining and modeling latent search concepts in order to recreate the conceptual view of the original information need. We use Latent Dirichlet Allocation (LDA) to exhibit highly-specific query-related topics from pseudo-relevant feedback documents. We define these topics as the latent concepts of the user query. We perform a thorough evaluation of our approach over two large ad-hoc TREC collections. Our findings reveal that the proposed method accurately models latent concepts, while being very effective in a query expansion retrieval setting.RÉSUMÉ. Une requête est la représentation du besoin d'information d'un utilisateur, et est le résultat d'un processus cognitif complexe qui mène souvent à un mauvais choix de mots-clés. Nous proposons une méthode non supervisée pour la modélisation de concepts implicites d'une requête, dans le but de recréer la représentation conceptuelle du besoin d'information initial. Nous utilisons l'allocation de Dirichlet latente (LDA) pour détecter les concepts implicites de la requête en utilisant des documents pseudo-pertinents. Nous évaluons cette méthode en profondeur en utilisant deux collections de test de TREC. Nous trouvons notamment que notre approche permet de modéliser précisément les concepts implicites de la requête, tout en obtenant de bonnes performances dans le cadre d'une recherche de documents.

show abstract

Section: Related Workmentioning

confidence: 99%

Accurate and effective latent concept modeling for ad hoc information retrieval

Deveaud¹,

SanJuan²,

Bellot³

2014

Document numérique

398

253

View full text Add to dashboard Cite

show abstract

“…A second development has been the commoditization of Map/Reduce distributed processing [28], and large-scale distributed file system [38] by Hadoop 4 . The HBase 5 project provides a large NoSQL keyvalue storage system for the Hadoop file system that provides fast access to hundreds of millions of records.…”

Section: Infrastructure Development and Deploymentmentioning

confidence: 99%

“…Once learned, these topics should correlate well with human concepts, for example, one model might produce topics that cover ideas such as government affairs, sports, and movies. With these unsupervised methods, we can utilize useful semantic information in a variety of tasks that depend on identifying unique topics or concepts, such as distributional semantics [52], word sense induction [113,18], and information retrieval [4].…”

Section: Exploring Topic Coherence Over Many Models and Many Topicsmentioning

confidence: 99%

“…And while the two forms of LSA have not typically been referred to as Topic Models, they have been used in a variety of similar contexts such as distributional similarity [52], word sense induction [113,18], and information retrieval [4]. Based on these similar use cases, we consider it useful to compare these models with a consistent evaluation that matches well with our overall goal: latent factors should bring together similar words and separate unrelated words and latent factors should help distinguish between documents covering distinct topics.…”

Section: Topic Modelsmentioning

confidence: 99%

See 1 more Smart Citation

Rapid Exploitation and Analysis of Documents

Buttler¹,

Andrzejewski²,

Stevens³

et al. 2011

View full text Add to dashboard Cite

“…Topic modeling based on Latent Dirichlet Allocation (LDA) [6] has become a popular tool for data exploration, dimensionality reduction and for facilitating myriad other tasks [2,1,12]. As a fully unsupervised technique, however, topic models are unequipped to utilize limited supervisory information, e.g.…”

Section: Introductionmentioning

confidence: 99%

From Topic Models to Semi-supervised Learning: Biasing Mixed-Membership Models to Exploit Topic-Indicative Features in Entity Clustering

Balasubramanyan

Dalvi

Cohen

2013

Advanced Information Systems Engineering

View full text Add to dashboard Cite

Abstract. We present methods to introduce different forms of supervision into mixed-membership latent variable models. Firstly, we introduce a technique to bias the models to exploit topic-indicative features, i.e. features which are apriori known to be good indicators of the latent topics that generated them. Next, we present methods to modify the Gibbs sampler used for approximate inference in such models to permit injection of stronger forms of supervision in the form of labels for features and documents, along with a description of the corresponding change in the underlying generative process. This ability allows us to span the range from unsupervised topic models to semi-supervised learning in the same mixed membership model. Experimental results from an entity-clustering task demonstrate that the biasing technique and the introduction of feature and document labels provide a significant increase in clustering performance over baseline mixed-membership methods.

show abstract

Latent topic feedback for information retrieval

Cited by 61 publications

References 25 publications

Accurate and effective latent concept modeling for ad hoc information retrieval

Accurate and effective latent concept modeling for ad hoc information retrieval

Rapid Exploitation and Analysis of Documents

From Topic Models to Semi-supervised Learning: Biasing Mixed-Membership Models to Exploit Topic-Indicative Features in Entity Clustering

Contact Info

Product

Resources

About