Nearest neighbor regression in the presence of bad hubs

Búza, Krisztián; Νανόπουλος, Αλέξανδρος; Nagy, Gábor P.

doi:10.1016/j.knosys.2015.06.010

Cited by 40 publications

(21 citation statements)

References 35 publications

(47 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This means, roughly speaking, that bad hubs are expected in complex data, such as drug-target interaction data. For a more detailed discussion, we refer to [7].…”

Section: Ecknn: K-nearest Neighbor Regression With Error Correctionmentioning

confidence: 99%

“…[8] [28] [46], and hubness-aware classifiers have been developed, see [45] for a survey. More recently, hubness-aware regression techniques, including knearest neighbor with error correction (ECkNN), were developed that allow for predictions on a continuous scale [7]. Despite the fact that hubness-aware techniques are among the most promising recent machine learning approaches, their potential to enhance drug-target interaction prediction methods has not been exploited yet: to the best of our knowledge, our initial work [6] is the only one aiming to apply hubness-aware models to the drug-target prediction problem.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Drug–target interaction prediction with Bipartite Local Models and hubness-aware regression

Búza

Peška

2017

Neurocomputing

Self Cite

View full text Add to dashboard Cite

“…This means, roughly speaking, that bad hubs are expected in complex data, such as drug-target interaction data. For a more detailed discussion, we refer to [7].…”

Section: Ecknn: K-nearest Neighbor Regression With Error Correctionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Drug–target interaction prediction with Bipartite Local Models and hubness-aware regression

Búza

Peška

2017

Neurocomputing

Self Cite

View full text Add to dashboard Cite

“…In order to keep the example simple, we use k = 1 nearest neighbour to calculate the corrected labels of training instances. In the figure, directed edges point from each instance to its first nearest For more details about ECkNN we refer to [18]. As mentioned previously, the dynamics of typing is captured by time series data.…”

Section: Nearest Neighbour Regression With Error Correctionmentioning

confidence: 99%

“…We set k = 5 for ECkNN which is in accordance with other works on hubness-aware machine learning [18,22].…”

Section: Evaluation Of Pairwise Modelsmentioning

confidence: 99%

A New Proposal for Person Identification Based on the Dynamics of Typing: Preliminary Results

Búza

Neubrandt

2017

Theor. Appl. Inform.

Self Cite

View full text Add to dashboard Cite

The availability of cheap and widely applicable person identification techniques is essential due to a wide-spread usage of online services. The dynamics of typing is characteristic to particular users, and users are hardly able to mimic the dynamics of typing of others. State-of-the-art solutions for person identification from the dynamics of typing are based on machine learning. The presence of hubs, i.e., few instances that appear as nearest neighbours of surprisingly many other instances, have been observed in various domains recently and hubness-aware machine learning approaches have been shown to work well in those domains. However, hubness has not been studied in the context of person identification yet, and hubnessaware techniques have not been applied to this task. In this paper, we examine hubness in typing data and propose to use ECkNN, a recent hubness-aware regression technique together with dynamic time warping for person identification. We collected time-series data describing the dynamics of typing and used it to evaluate our approach. Experimental results show that hubness-aware techniques outperform state-of-the-art time-series classifiers.

show abstract

“…Hubness has been identified as a detrimental factor in similarity-based machine learning, impairing several classification [44], clustering [47,60], regression [7], graph analysis [22], visualization [17], and outlier detection [18,19,45] methods. Reports on affected tasks include multimedia retrieval [51], recommendation [48], collaborative filtering [25,34], speaker verification [50], speech recognition [62], and image data classification [58].…”

Section: Introductionmentioning

confidence: 99%

A comprehensive empirical comparison of hubness reduction in high-dimensional spaces

Feldbauer

Flexer

2018

Knowl Inf Syst

View full text Add to dashboard Cite

Hubness is an aspect of the curse of dimensionality related to the distance concentration effect. Hubs occur in high-dimensional data spaces as objects that are particularly often among the nearest neighbors of other objects. Conversely, other data objects become antihubs, which are rarely or never nearest neighbors to other objects. Many machine learning algorithms rely on nearest neighbor search and some form of measuring distances, which are both impaired by high hubness. Degraded performance due to hubness has been reported for various tasks such as classification, clustering, regression, visualization, recommendation, retrieval and outlier detection. Several hubness reduction methods based on different paradigms have previously been developed. Local and global scaling as well as shared neighbors approaches aim at repairing asymmetric neighborhood relations. Global and localized centering try to eliminate spatial centrality, while the related global and local dissimilarity measures are based on density gradient flattening. Additional methods and alternative dissimilarity measures that were argued to mitigate detrimental effects of distance concentration also influence the related hubness phenomenon. In this paper, we present a large-scale empirical evaluation of all available unsupervised hubness reduction methods and dissimilarity measures. We investigate several aspects of hubness reduction as well as its influence on data semantics which we measure via nearest neighbor classification. Scaling and density gradient flattening methods improve evaluation measures such as hubness and classification accuracy consistently for data sets from a wide range of domains, while centering approaches achieve the same only under specific settings.

show abstract

Nearest neighbor regression in the presence of bad hubs

Cited by 40 publications

References 35 publications

Drug–target interaction prediction with Bipartite Local Models and hubness-aware regression

Drug–target interaction prediction with Bipartite Local Models and hubness-aware regression

A New Proposal for Person Identification Based on the Dynamics of Typing: Preliminary Results

A comprehensive empirical comparison of hubness reduction in high-dimensional spaces

Contact Info

Product

Resources

About