Topical clustering of search results

Scaiella, Ugo; Ferragina, Paolo; Marino, Andrea; Ciaramita, Massimiliano

doi:10.1145/2124295.2124324

Cited by 87 publications

(71 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…TAG MY SEARCH [5] is an example of Wikipedia-based topic discovery applied to a related task of general Web search result clustering. Unlike ScienScan, TAG MY SEARCH uses only articles but not categories of Wikipedia to represent topics, and thus performs flat rather then hierarchical grouping of the search results.…”

Section: Related Work and Discussionmentioning

confidence: 99%

ScienScan – An Efficient Visualization and Browsing Tool for Academic Search

Mirylenka

Passerini

2013

Advanced Information Systems Engineering

View full text Add to dashboard Cite

Abstract. In this paper we present ScienScan 1 -a browsing and visualization tool for academic search. The tool operates in real time by post-processing the query results returned by an academic search engine. ScienScan discovers topics in the search results and summarizes them in the form of a concise hierarchical topic map. The produced topical summary informatively represents the results in a visual way and provides an additional filtering control. We demonstrate the operation of ScienScan deploying it on top of the search API of Microsoft Academic Search.

show abstract

Section: Related Work and Discussionmentioning

confidence: 99%

ScienScan – An Efficient Visualization and Browsing Tool for Academic Search

Mirylenka

Passerini

2013

Advanced Information Systems Engineering

View full text Add to dashboard Cite

show abstract

“…They use the K Nearest Neighbor (KNN) algorithm to find keyword clusters and then form document clusters by their similarity with each keyword cluster but they do not have statistical analysis on cluster labeling. Scaiella et al use a Wikipedia annotator TAGME to find the Wikipedia page titles associated with each document snippet [21]. In their keyword graph, a node is a Wikipedia page title (topic), the edge weights are the topic-to-topic similarities computed based on the Wikipedia linked-structure.…”

Section: A Review Of Atg Approachesmentioning

confidence: 99%

“…This algorithm automatically detects the number of communities and generates compact taxonomies. It has an advantage over existing commercial systems such as carrotsearch.com and Yippy, and also some most recent works since these methods partition the document collection to about 10 clusters which is not always the real number of topics [2] [21]. While many state-of-the-art search result clustering algorithms are flat, our method applies the Fast Modularity algorithm recursively in a top-down manner until certain conditions are reached.…”

Section: Phase Iii: Community Miningmentioning

confidence: 99%

Text Document Topical Recursive Clustering and Automatic Labeling of a Hierarchy of Document Clusters

Chen

Zaı̈ane

2013

Advances in Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

Abstract. The overwhelming amount of textual documents available nowadays highlights the need for information organization and discovery. Effectively organizing documents into a hierarchy of topics and subtopics makes it easier for users to browse the documents. This paper borrows community mining from social network analysis to generate a hierarchy of topically coherent document clusters. It focuses on giving the document clusters descriptive labels. We propose to use betweenness centrality measure in networks of co-occurring terms to label the clusters. We also incorporate keyphrase extraction and automatic titling in cluster labeling. The results show that the cluster labeling method utilizing KEA to extract keyphrases from the documents generates the best labels overall comparing to other methods and baselines.

show abstract

“…However, it has been applied to clustering snippets resulting from Web search [19]. Each result snippet is annotated using TAGME 8 .…”

Section: Introductionmentioning

confidence: 99%

A Graph-Based Approach to Topic Clustering for Online Comments to News

Aker

Kurtić

Balamurali

et al. 2016

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. This paper investigates graph-based approaches to labeled topic clustering of reader comments in online news. For graph-based clustering we propose a linear regression model of similarity between the graph nodes (comments) based on similarity features and weights trained using automatically derived training data. To label the clusters our graph-based approach makes use of DBPedia to abstract topics extracted from the clusters. We evaluate the clustering approach against gold standard data created by human annotators and compare its results against LDA -currently reported as the best method for the news comment clustering task. Evaluation of cluster labelling is set up as a retrieval task, where human annotators are asked to identify the best cluster given a cluster label. Our clustering approach significantly outperforms the LDA baseline and our evaluation of abstract cluster labels shows that graph-based approaches are a promising method of creating labeled clusters of news comments, although we still find cases where the automatically generated abstractive labels are insufficient to allow humans to correctly associate a label with its cluster.

show abstract

Topical clustering of search results

Cited by 87 publications

References 24 publications

ScienScan – An Efficient Visualization and Browsing Tool for Academic Search

ScienScan – An Efficient Visualization and Browsing Tool for Academic Search

Text Document Topical Recursive Clustering and Automatic Labeling of a Hierarchy of Document Clusters

A Graph-Based Approach to Topic Clustering for Online Comments to News

Contact Info

Product

Resources

About