Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?

Houle, Michael E.; Kriegel, Hans‐Peter; Kröger, Peer; Schubert, Erich; Zimek, Arthur

doi:10.1007/978-3-642-13818-8_34

Cited by 213 publications

(169 citation statements)

References 36 publications

Supporting

Mentioning

165

Contrasting

Unclassified

Order By: Relevance

“…Regardless of the symbol set employed, it is clear that the approach described can lead to sparse elements embedded in high dimensional vector spaces. While data sets of this kind can be potentially problematic Beyer et al (1999); Hinneburg et al (2000); Houle et al (2010); Steinbach et al (2003), subspace dimension reduction techniques are derivable from LSI approaches such as the SVD. The IR techniques introduced above are readily applicable in any setting where bioinformatics data (sequence, structural, symbolic, etc) can be encoded.…”

Section: Discussionmentioning

confidence: 99%

Vector Space Information Retrieval Techniques for Bioinformatics Data Mining

Sakk¹,

Odebode²

2011

Bioinformatics - Trends and Methodologies

View full text Add to dashboard Cite

Section: Discussionmentioning

confidence: 99%

Vector Space Information Retrieval Techniques for Bioinformatics Data Mining

Sakk¹,

Odebode²

2011

Bioinformatics - Trends and Methodologies

View full text Add to dashboard Cite

“…This principle of a common set of N N in different dimensions is similar to the concept of the shared nearest neighbor distance [6] or consensus methods. The intuition is that the member dimensions of a subspace agree (to a certain minimum threshold) in their N N rankings, when considered individually.…”

Section: Definition Of Subspace Nearest Neighbor Searchmentioning

confidence: 99%

Subspace Nearest Neighbor Search - Problem Statement, Approaches, and Discussion

Hund

Behrisch

Färber

et al. 2015

Similarity Search and Applications

View full text Add to dashboard Cite

Abstract.Computing the similarity between objects is a central task for many applications in the field of information retrieval and data mining. For finding k-nearest neighbors, typically a ranking is computed based on a predetermined set of data dimensions and a distance function, constant over all possible queries. However, many high-dimensional feature spaces contain a large number of dimensions, many of which may contain noise, irrelevant, redundant, or contradicting information. More specifically, the relevance of dimensions may depend on the query object itself, and in general, different dimension sets (subspaces) may be appropriate for a query. Approaches for feature selection or -weighting typically provide a global subspace selection, which may not be suitable for all possibly queries. In this position paper, we frame a new research problem, called subspace nearest neighbor search, aiming at multiple querydependent subspaces for nearest neighbor search. We describe relevant problem characteristics, relate to existing approaches, and outline potential research directions.

show abstract

“…According to [14], the shared nearest neighborhood (SNN) method can be used due to its robustness in high dimension dataset. However, SNN is not efficient because of its complexity.…”

Section: The Kddbscan Algorithmmentioning

confidence: 99%

A k-Deviation Density Based Clustering Algorithm

Chen

Yang

et al. 2018

Mathematical Problems in Engineering

View full text Add to dashboard Cite

Due to the adoption of global parameters, DBSCAN fails to identify clusters with different and varied densities. To solve the problem, this paper extends DBSCAN by exploiting a new density definition and proposes a novel algorithm called -deviation density based DBSCAN (kDDBSCAN). Various datasets containing clusters with arbitrary shapes and different or varied densities are used to demonstrate the performance and investigate the feasibility and practicality of kDDBSCAN. The results show that kDDBSCAN performs better than DBSCAN.

show abstract

Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?

Cited by 213 publications

References 36 publications

Vector Space Information Retrieval Techniques for Bioinformatics Data Mining

Vector Space Information Retrieval Techniques for Bioinformatics Data Mining

Subspace Nearest Neighbor Search - Problem Statement, Approaches, and Discussion

A k-Deviation Density Based Clustering Algorithm

Contact Info

Product

Resources

About