2009
DOI: 10.1145/1568292.1568295
|View full text |Cite
|
Sign up to set email alerts
|

Web Search Clustering and Labeling with Hidden Topics

Abstract: Web search clustering is a solution to reorganize search results (also called "snippets") in a more convenient way for browsing. There are three key requirements for such post-retrieval clustering systems: (1) the clustering algorithm should group similar documents together; (2) clusters should be labeled with descriptive phrases; and (3) the clustering system should provide high-quality clustering without downloading the whole Web page.This article introduces a novel framework for clustering Web search result… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2010
2010
2015
2015

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(10 citation statements)
references
References 30 publications
0
10
0
Order By: Relevance
“…Tseng's approach [19] labels clusters by mapping category terms to generic terms. Before this research, rules are used in Nguyen's [13] work to find readable phrases. However, the rules used in these papers are mostly lexical rules that cannot cover syntactical features.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Tseng's approach [19] labels clusters by mapping category terms to generic terms. Before this research, rules are used in Nguyen's [13] work to find readable phrases. However, the rules used in these papers are mostly lexical rules that cannot cover syntactical features.…”
Section: Related Workmentioning
confidence: 99%
“…Keywords are used in some existing research but a single term rarely gives users enough information. Existing research has reported that phrases [10,13] are more informative than keywords for understanding. However, the readability of phrases is rarely studied in existing research because it is very difficult to formalize the measurement of readability for phrases.…”
Section: Introductionmentioning
confidence: 99%
“…The basic idea is that if an n-word sequence tends to appear together, the sequence is more likely to be an n-word phrase. The phrases extracted from a text collection are considered as the label candidates of the text collection [37,38]. Mei et al [39] and Lau et al [40] reported that twoword phrases (bigrams) usually work better than other n-word phrases for label generation.…”
Section: Evaluation Of Facet Labeling Based On Degree Centrality and mentioning
confidence: 99%
“…Very little further work on this topic has been done: vector-based WSI was successfully shown to improve bag-of-words adhoc Information Retrieval [36] and experimental studies [10] have provided interesting, though preliminary, insights into the use of WSI for Web search result clustering. More recently the use of hidden topics has been proposed to identify query meanings [29]. However, topics -estimated from a universal dataset -are query-independent and thus their number needs to be found beforehand.…”
Section: Related Workmentioning
confidence: 99%