The influence of hubness on nearest-neighbor methods in object recognition

Tomašev, Nenad; Brehar, Raluca; Mladenić, Dunja; Nedevschi, Sergiu

doi:10.1109/iccp.2011.6047899

Cited by 23 publications

(34 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Reports on affected tasks include multimedia retrieval [51], recommendation [48], collaborative filtering [25,34], speaker verification [50], speech recognition [62], and image data classification [58].…”

Section: Introductionmentioning

confidence: 99%

A comprehensive empirical comparison of hubness reduction in high-dimensional spaces

Feldbauer

Flexer

2018

Knowl Inf Syst

View full text Add to dashboard Cite

Hubness is an aspect of the curse of dimensionality related to the distance concentration effect. Hubs occur in high-dimensional data spaces as objects that are particularly often among the nearest neighbors of other objects. Conversely, other data objects become antihubs, which are rarely or never nearest neighbors to other objects. Many machine learning algorithms rely on nearest neighbor search and some form of measuring distances, which are both impaired by high hubness. Degraded performance due to hubness has been reported for various tasks such as classification, clustering, regression, visualization, recommendation, retrieval and outlier detection. Several hubness reduction methods based on different paradigms have previously been developed. Local and global scaling as well as shared neighbors approaches aim at repairing asymmetric neighborhood relations. Global and localized centering try to eliminate spatial centrality, while the related global and local dissimilarity measures are based on density gradient flattening. Additional methods and alternative dissimilarity measures that were argued to mitigate detrimental effects of distance concentration also influence the related hubness phenomenon. In this paper, we present a large-scale empirical evaluation of all available unsupervised hubness reduction methods and dissimilarity measures. We investigate several aspects of hubness reduction as well as its influence on data semantics which we measure via nearest neighbor classification. Scaling and density gradient flattening methods improve evaluation measures such as hubness and classification accuracy consistently for data sets from a wide range of domains, while centering approaches achieve the same only under specific settings.

show abstract

Section: Introductionmentioning

confidence: 99%

A comprehensive empirical comparison of hubness reduction in high-dimensional spaces

Feldbauer

Flexer

2018

Knowl Inf Syst

View full text Add to dashboard Cite

show abstract

“…As this may be disadvantageous in some cases [51], in the algorithms considered below, the neighbors do not always vote by their own labels, which is a major difference to hw-kNN.…”

Section: Hw-knn: Hubness-aware Weightingmentioning

confidence: 99%

“…Hubs were shown to be relevant in various contexts, including text mining [45], [46], music retrieval and recommendation [47], [48], [49], [50], image data [51], [52] and time series [34], [53].…”

Section: Hubs In Eeg Datamentioning

confidence: 99%

Classification of Electroencephalograph Data: A Hubness-aware Approach

2016

APH

View full text Add to dashboard Cite

show abstract

“…In order to evaluate the utility of the examined secondary distance measures in actual classification tasks on realworld data, we have run experiments on a challenging iNet3Err dataset [35] that exhibits very high hubness and very high bad hubness due to a small number of mislabeled frequent nearest neighbors. The data itself is a 3-category 1000-dimensional quantized SIFT bagof-visual-words representation, corresponding to 2731 images from ImageNet repository (http://www.imagenet.org/).…”

Section: Improving Classifier Stabilitymentioning

confidence: 99%

“…Hubness is a pervasive phenomenon in intrinsically high-dimensional data, as it has been observed in documents [27], images [35], audio [2] [14] and sensor data [28].…”

Section: Introductionmentioning

confidence: 99%

Taming the Empirical Hubness Risk in Many Dimensions

Tomašev¹

2015

Proceedings of the 2015 SIAM International Conference on Data Mining

View full text Add to dashboard Cite

The hubness phenomenon has recently come into focus as an important aspect of the curse of dimensionality that affects many instance-based learning systems. It has to do with the long-tailed distribution of instance relevance within the models, where a small number of hub points dominates the analysis and influences many predictions. High data hubness is therefore often linked to poor system performance. In this paper, we re-examine several hubness-aware metric learning strategies that propose to improve system performance by reducing the expected data hubness. Instead of observing only the expected hubness degrees, our comparisons are aimed at evaluating the shape of the induced hubness degree distribution, in order to better estimate the associated hubness risk. It is revealed that the distribution of hubness degree tends to become highly skewed in many dimensions, so that many samples exhibit substantially higher hubness than expected. We argue that the long-tailed highhubness-degree-variance metrics can be susceptible to highly detrimental high-hubness events, even if they reduce the expected hubness of the data. The experiments indicate significant differences between the compared hubness-aware metric learning approaches and show that simhubs entails the lowest overall hubness risk with increasing dimensionality. This is further shown to improve classifier stability for k-nearest neighbor classification methods.keywords: hubness, curse of dimensionality, metric learning, secondary distances, classification 1 Introduction Instance-based learning in many dimensions is known to be quite challenging [7] due to various adverse effects of the curse of dimensionality [3]. The contrast between relevant and irrelevant points is often reduced due to distance concentration [12][18] [20] and nearest neighbors are considered to be far less meaningful in high-dimensional feature spaces [4][10]. Despite the difficulties, it is still possible to extract useful information from the k-nearest neighbor sets and kNN methods remain popular in many domains, including learning under class imbalance [13] and time series classification [43].

show abstract

The influence of hubness on nearest-neighbor methods in object recognition

Cited by 23 publications

References 25 publications

A comprehensive empirical comparison of hubness reduction in high-dimensional spaces

A comprehensive empirical comparison of hubness reduction in high-dimensional spaces

Classification of Electroencephalograph Data: A Hubness-aware Approach

Taming the Empirical Hubness Risk in Many Dimensions

Contact Info

Product

Resources

About