Standardized Evaluation Method for Web Clustering Results

Crabtree, Daniel; Gao, Xiaoying; Andreae, Peter

doi:10.1109/wi.2005.138

Cited by 17 publications

(11 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Therefore, in this paper we used the Entropy (E) and Purity (P) measures to assess the performance of the algorithms. This is possible since that the groups are known in advance, but this information is used here only for benchmarking purposes, which is a common practice in the literature (Crabtree et al, 2005), (Zhao and Karypis, 2004). Given a cluster S r of size n r , the entropy E(S r ) of this cluster can be measured as follows: …”

Section: Methodsmentioning

confidence: 99%

The Proposal Of Two Bio-Inspired Algorithms For Text Clustering

Prior¹,

Castro²,

Freitas³

et al. 2008

L&NLM

View full text Add to dashboard Cite

-The Internet can be seen as a major repository of resources and information. The growing demand for information, along with the large amount of data available, has been stimulating the research of methods for text mining. This work aims at using feature selection and text clustering techniques based on a Particle Swarm Clustering (PSC) algorithm and on an Artificial Neural Network modeled as a competitive and constructive Antibody Network, called RABNET (Real-valued Antibody Network), to show that both techniques present relevant results when applied to text clustering problems.

show abstract

Section: Methodsmentioning

confidence: 99%

The Proposal Of Two Bio-Inspired Algorithms For Text Clustering

Prior¹,

Castro²,

Freitas³

et al. 2008

L&NLM

View full text Add to dashboard Cite

show abstract

“…In this study, nine representative cluster validity indexes (sum of squared error [23], entropy [23,25], relaxation error [26], Davies-Bouldin index, Calinski-Harabasz index, Silhouette statistic, Dunn index, SD validity index, and S_Bbw validity index) were used to evaluate the clustering results. These cluster validity indexes are used to evaluate clustering results generally.…”

Section: Cluster Validity Indexmentioning

confidence: 99%

Analysis of Clustering Evaluation Considering Features of Item Response Data Using Data Mining Technique for Setting Cut-Off Scores

Kim

2017

Symmetry

View full text Add to dashboard Cite

Abstract:The setting of standards is a critical process in educational evaluation, but it is time-consuming and expensive because it is generally conducted by an education experts group. The purpose of this paper is to find a suitable cluster validity index that considers the futures of item response data for setting cut-off scores. In this study, nine representative cluster validity indexes were used to evaluate the clustering results. Cohen's kappa coefficient is used to check the conformity between a set cut-off score using four clustering techniques and a cut-off score set by experts. We compared the cut-off scores by each cluster validity index and by a group of experts. The experimental results show that the entropy-based method considers the features of item response data, so it has a realistic possibility of applying a clustering evaluation method to the setting of standards in criterion referenced evaluation.

show abstract

“…In Crabtree et al (2005) it is suggested that evaluation methodologies can be split into two categories: internal quality, based on objective functions specific to the algorithm, and external quality, which evaluates the output clustering. External quality assessment can be further divided into gold-standard, task-oriented and user evaluation.…”

Section: Measuring Cluster Qualitymentioning

confidence: 99%

“…The size of the dataset was chosen as this was the maximum number of results available from the default search engine API (Yahoo), as well as considering the inefficiency of downloading all the results for processing. This paper includes the results of TCA, FTCA, STC and Lingo on four queries used in other papers (Xiao and Hung 2008; Crabtree et al 2005; Janruang and Kreesuradej 2006) (Jaguar, Apple, Java, Salsa), using the Yahoo! search API (except for the Jaguar dataset, which is taken from Xiao and Hung 2008), as well as results from ODP.…”

Section: Measuring Cluster Qualitymentioning

confidence: 99%

A transduction-based approach to fuzzy clustering, relevance ranking and cluster label generation on web search results

Matsumoto

Hung

2011

J Intell Inf Syst

View full text Add to dashboard Cite

This paper details a modular, self-contained web search results clustering system that enhances search results by (i) performing clustering on lists of web documents returned by queries to search engines, and (ii) ranking the results and labeling the resulting clusters, by using a calculated relevance value as a degree of membership to clusters. In addition, we demonstrate an external evaluation method based on precision for comparing fuzzy clustering techniques, as well as internal measures suitable for working on non-training data. The built-in label generator uses the membership degrees and relevance values to weight the most relevant results more heavily. The membership degrees of documents to fuzzy clusters also facilitate effective detection and removal of overly similar clusters. To achieve this, our transductionbased clustering algorithm (TCA) and its fuzzy counterpart (FTCA) employ a transduction-based relevance model (TRM) to consider local relationships between each web document. Results from testing on five different real-world and synthetic datasets results show favorable results compared to established label-based clustering algorithms Suffix Tree Clustering (STC) and Lingo.

show abstract

Standardized Evaluation Method for Web Clustering Results

Cited by 17 publications

References 6 publications

The Proposal Of Two Bio-Inspired Algorithms For Text Clustering

The Proposal Of Two Bio-Inspired Algorithms For Text Clustering

Analysis of Clustering Evaluation Considering Features of Item Response Data Using Data Mining Technique for Setting Cut-Off Scores

A transduction-based approach to fuzzy clustering, relevance ranking and cluster label generation on web search results

Contact Info

Product

Resources

About