A new Mallows distance based metric for comparing clusterings

Zhou, Ding; Li, Jia; Zha, Hongyuan

doi:10.1145/1102351.1102481

Cited by 50 publications

(50 citation statements)

References 13 publications

(17 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Given two users' community membership distributions, we use the Categorical Clustering Distance(CCD) [25] to compare the quality of these two distributions. This method relates only to users' community membership distributions.…”

Section: Evaluation Metricmentioning

confidence: 99%

Mining topics on participations for community discovery

Zheng

Guo

Yang

et al. 2011

Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

Community discovery on large-scale linked document corpora has been a hot research topic for decades. There are two types of links. The first one, which we call d2d-link, indicates connectiveness among different documents, such as blog references and research paper citations. The other one, which we call u2u-link, represents co-occurrences or simultaneous participations of different users in one document and typically each document from u2u-link corpus has more than one user/author. Examples of u2u-link data covers email archives and research paper co-authorship networks. Community discovery in d2d-link data has achieved much success, while methods for that in u2u-link data either make no use of the textual content of the documents or make oversimplified assumptions about the users and the textual content. In this paper we propose a general approach of community discovery for u2u-link data, i.e., multiple user data, by placing topical variables on multiple authors' participations in documents. Experiments on a research proceeding co-authorship corpus and a New York Times news corpus show the effectiveness of our model.

show abstract

Section: Evaluation Metricmentioning

confidence: 99%

Mining topics on participations for community discovery

Zheng

Guo

Yang

et al. 2011

Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

show abstract

“…For partitionings pðS 1 Þ; pðS 2 Þ find the optimal clusters correspondence, such that the sum of distances between matched clusters is minimal. Then the contribution of a single data vector x i 2 S c to the overall difference between cluster structures is given by [26] …”

Section: Clustering Stabilitymentioning

confidence: 99%

Quantitative description of 3D vascularity images: texture-based approach and its verification through cluster analysis

Klepaczko

Kocinski

Materka

2010

Pattern Anal Applic

View full text Add to dashboard Cite

This paper undertakes the problem of quantitative inspection of 3D vascular tree images. Through the use of cluster analysis, it confirms the correspondence between texture descriptors and various vessel system parameters, such as blood viscosity and the number of tree branches. Moreover, it is shown that unsupervised selection of significant texture parameters, especially in the synthetic data sets corresponding to noisy images, becomes feasible if the search for relevant attributes is guided by the clustering stability-based optimization criterion.

show abstract

“…We use the Categorical Clustering Distance(CCD) [7] to compare the similarity between the computed community distribution and the ideal community distribution. For PA-PER, we treat each conference as a community and the proportion of the number of papers one author published in each conference as the ideal probability the author belongs to that community.…”

Section: Community Membership Evaluationmentioning

confidence: 99%

A topical link model for community discovery in textual interaction graph

Zheng

Guo

Yang

et al. 2010

Proceedings of the 19th ACM International Conference on Information and Knowledge Management

View full text Add to dashboard Cite

This paper is concerned with community discovery in textual interaction graph, where the links between entities are indicated by textual documents. Specifically, we propose a Topical Link Model(TLM), which leverages Hierarchical Dirichlet Process(HDP) to introduce hidden topical variable of the links. Other than the use of links, TLM can look into the documents on the links in detail to recover sound communities. Moreover, TLM is a nonparametric model, which is able to learn the number of communities from the data. Extensive experiments on two real world corpora show TLM outperforms two state-of-the-art baseline models, which verify the effectiveness of TLM in determining the proper number of communities and generating sound communities.

show abstract

A new Mallows distance based metric for comparing clusterings

Cited by 50 publications

References 13 publications

Mining topics on participations for community discovery

Mining topics on participations for community discovery

Quantitative description of 3D vascularity images: texture-based approach and its verification through cluster analysis

A topical link model for community discovery in textual interaction graph

Contact Info

Product

Resources

About