Clustering Using a Similarity Measure Based on Shared Near Neighbors

Jarvis, Ray; Patrick, Edward A.

doi:10.1109/t-c.1973.223640

Cited by 873 publications

(490 citation statements)

References 9 publications

Supporting

Mentioning

482

Contrasting

Unclassified

Order By: Relevance

“…Early approaches exploited the so called guilt-by-association (GBA) rule, which makes predictions based on the majority or weighted majority of labels in the direct neighborhood, assuming that interacting nodes are likely to share similar properties [38,57]. Analogously, k-nearest neighborhood (kNN) methods consider only the labels of the k most similar neighbors [32]; in turn, shared similarity metrics, as those proposed in [28,18], can be introduced to generalize the notion of pairwise-similarity among nodes by taking into account the contribution of shared neighbors [14,9]. Other methodologies predict labels by propagating node labels to neighbors with an iterative process until convergence [70,69], or by evaluating the functional flows through the nodes of the graph [62,49].…”

Section: Introductionmentioning

confidence: 99%

Learning node labels with multi-category Hopfield networks

Frasca

Bassis

Valentini

2015

Neural Comput & Applic

View full text Add to dashboard Cite

In several real-world node-label prediction problems on graphs, in fields ranging from computational biology to World-Wide-Web analysis, nodes can be partitioned into categories different from the classes to be predicted, on the basis of their characteristics or their common properties. Such partitions may provide further information about node classification that classical machine learning algorithms do not take into account. We introduce a novel family of parametric Hopfield networks (m-Category Hopfield Networks) and a novel algorithm (Hopfield Multi-Category -HoMCat), designed to appropriately exploit the presence of propertybased partitions of nodes into multiple categories. Moreover, the proposed model adopts a cost-sensitive learning strategy to prevent the remarkable decay in performance usually observed when instance labels are unbalanced, that is when one class of labels is highly under-represented than the other one. We validate the proposed model on both synthetic and real-world data, in the context of multi-species function prediction, where the classes to be predicted are the Gene Ontology terms and the categories the different species in the multi-species protein network. We carried out an intensive experimental validation, which on the one hand compares HoMCat with several state-of-the-art graph-based algorithms, and on the other hand reveals that exploiting meaningful prior partitions of input data can substantially improve classification performances.

show abstract

Section: Introductionmentioning

confidence: 99%

Learning node labels with multi-category Hopfield networks

Frasca

Bassis

Valentini

2015

Neural Comput & Applic

View full text Add to dashboard Cite

show abstract

“…For computing the nearest neighbors in high dimensional data, SNN measures have been reported to be effective in practice, and supposedly less prone to the curse of dimensionality than conventional distance measures. SNN measures have found use in the design of merge criteria of agglomerative clustering algorithms [25,27,28], in approaches for clustering high-dimensional data sets [26,29], and in finding outliers in subspaces of high dimensional data [30]. However, in all of these studies, no systematic investigation has been made into the advantages of SNN measures over conventional distance measures for high-dimensional data.…”

Section: Introductionmentioning

confidence: 99%

Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?

Houle

Kriegel

Kröger

et al. 2010

Lecture Notes in Computer Science

212

153

View full text Add to dashboard Cite

Abstract. The performance of similarity measures for search, indexing, and data mining applications tends to degrade rapidly as the dimensionality of the data increases. The effects of the so-called 'curse of dimensionality' have been studied by researchers for data sets generated according to a single data distribution. In this paper, we study the effects of this phenomenon on different similarity measures for multiplydistributed data. In particular, we assess the performance of sharedneighbor similarity measures, which are secondary similarity measures based on the rankings of data objects induced by some primary distance measure. We find that rank-based similarity measures can result in more stable performance than their associated primary distance measures.

show abstract

“…[8]). A proximity measure that is better suited for multidimensional data was proposed in [6]. In this paper, proximity between a pair of points was defined to be the number of neighbors they share.…”

Section: Estimating Proximity In Multidimensional Spaces With Sparse mentioning

confidence: 99%

“…The algorithm, named Clustering With Nearest Neighborhood (CWNN), is inspired by ideas presented in [2], [6] and [7]. CWNN employs the SNN graph to detect the so-called core data points.…”

Section: Introductionmentioning

confidence: 99%

Approximate Clustering of Noisy Biomedical Data

Boryczko

Kurdziel

2008

Computational Science – ICCS 2008

View full text Add to dashboard Cite

Abstract. Classical clustering algorithms often perform poorly on data harboring background noise, i.e. large number of observations distributed uniformly in the feature space. Here, we present a new density-based algorithm for approximate clustering of such noisy data. The algorithm employs Shared Nearest Neighbor Graphs for estimating local data density and identification of core points, which are assumed to indicate locations of clusters. Partitioning of core points into clusters is performed by means of Mutual Nearest Neighbor distance measure. This similarity measure is sensitive to changes in local data density, and is thus useful for discovering clusters that differ in this respect. Performance of the presented algorithm was demonstrated on three data sets, two synthetic and one real world. In all cases, meaningful clustering structures were discovered.

show abstract

Clustering Using a Similarity Measure Based on Shared Near Neighbors

Cited by 873 publications

References 9 publications

Learning node labels with multi-category Hopfield networks

Learning node labels with multi-category Hopfield networks

Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?

Approximate Clustering of Noisy Biomedical Data

Contact Info

Product

Resources

About