Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2011
DOI: 10.1145/2020408.2020503
|View full text |Cite
|
Sign up to set email alerts
|

Latent topic feedback for information retrieval

Abstract: We consider the problem of a user navigating an unfamiliar corpus of text documents where document metadata is limited or unavailable, the domain is specialized, and the user base is small. These challenging conditions may hold, for example, within an organization such as a business or government agency. We propose to augment standard keyword search with user feedback on latent topics. These topics are automatically learned from the corpus in an unsupervised manner and presented alongside search results. User … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0
1

Year Published

2011
2011
2019
2019

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 61 publications
(19 citation statements)
references
References 25 publications
0
18
0
1
Order By: Relevance
“…Probabilistic topic modeling (and especially Latent Dirichlet Allocation) for information retrieval has been widely used recently in several ways (Andrzejewski, Buttler, 2011;Lu et al, 2011;Park, Ramamohanarao, 2009;Wei, Croft, 2006;Yi, Allan, 2009) and all studies reported improvements in document retrieval effectiveness. The main idea is to build a static topic model (using either LSA, pLSA, or LDA) of the collection, which will never be further updated, and to smooth the document language model by incorporating probabilities of words that belong to some topics matching the query (Lu et al, 2011;Park, Ramamohanarao, 2009;Wei, Croft, 2006;Yi, Allan, 2009).…”
Section: Related Workmentioning
confidence: 99%
“…Probabilistic topic modeling (and especially Latent Dirichlet Allocation) for information retrieval has been widely used recently in several ways (Andrzejewski, Buttler, 2011;Lu et al, 2011;Park, Ramamohanarao, 2009;Wei, Croft, 2006;Yi, Allan, 2009) and all studies reported improvements in document retrieval effectiveness. The main idea is to build a static topic model (using either LSA, pLSA, or LDA) of the collection, which will never be further updated, and to smooth the document language model by incorporating probabilities of words that belong to some topics matching the query (Lu et al, 2011;Park, Ramamohanarao, 2009;Wei, Croft, 2006;Yi, Allan, 2009).…”
Section: Related Workmentioning
confidence: 99%
“…A second development has been the commoditization of Map/Reduce distributed processing [28], and large-scale distributed file system [38] by Hadoop 4 . The HBase 5 project provides a large NoSQL keyvalue storage system for the Hadoop file system that provides fast access to hundreds of millions of records.…”
Section: Infrastructure Development and Deploymentmentioning
confidence: 99%
“…Once learned, these topics should correlate well with human concepts, for example, one model might produce topics that cover ideas such as government affairs, sports, and movies. With these unsupervised methods, we can utilize useful semantic information in a variety of tasks that depend on identifying unique topics or concepts, such as distributional semantics [52], word sense induction [113,18], and information retrieval [4].…”
Section: Exploring Topic Coherence Over Many Models and Many Topicsmentioning
confidence: 99%
See 1 more Smart Citation
“…Topic modeling based on Latent Dirichlet Allocation (LDA) [6] has become a popular tool for data exploration, dimensionality reduction and for facilitating myriad other tasks [2,1,12]. As a fully unsupervised technique, however, topic models are unequipped to utilize limited supervisory information, e.g.…”
Section: Introductionmentioning
confidence: 99%