2013
DOI: 10.4304/jetwi.5.2.91-97
|View full text |Cite
|
Sign up to set email alerts
|

Intrinsic Dimensionality Estimation for High-dimensional Data Sets: New Approaches for the Computation of Correlation Dimension

Abstract: The analysis of high–dimensional data is usually challenging since many standard modelling approaches tend to break down due to the so–called “curse of dimensionality”. Dimension reduction techniques, which reduce the data set (explicitly or implicitly) to a smaller number of variables, make the data analysis more efficient and are furthermore useful for visualization purposes. However, most dimension reduction techniques require fixing the intrinsic dimension of the low-dimensional sub… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
19
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(19 citation statements)
references
References 19 publications
0
19
0
Order By: Relevance
“…Thus, the number of sampling points, fallen into the neighborhood, should be proportional to the volume of the q-dimensional ball. This result was mentioned a number of times in the works (for example, Levina and Bickel (2005;Einbeck and Kalantana, 2013;Singer and Wu, 2012)). However, in the Theorem 1, we prove that the number of sampling points in the neighborhood, divided by the volume of the neighborhood, is a consistent estimate of the density at the point.…”
Section: Resultsmentioning
confidence: 80%
See 1 more Smart Citation
“…Thus, the number of sampling points, fallen into the neighborhood, should be proportional to the volume of the q-dimensional ball. This result was mentioned a number of times in the works (for example, Levina and Bickel (2005;Einbeck and Kalantana, 2013;Singer and Wu, 2012)). However, in the Theorem 1, we prove that the number of sampling points in the neighborhood, divided by the volume of the neighborhood, is a consistent estimate of the density at the point.…”
Section: Resultsmentioning
confidence: 80%
“…The most popular model of high-dimensional data, which occupy a very small part of observation space p ℝ , is Manifold model in accordance with which the data lie on or near an unknown manifold (Data manifold, DM) X of lower dimensionality q<p embedded in an ambient high-dimensional input space p ℝ (Manifold assumption Seung and Lee (2000) about high-dimensional data); typically, this assumption is satisfied for 'real-world' highdimensional data obtained from 'natural' sources. In real examples, a manifold dimension q is usually unknown and can be estimated by a given dataset randomly sampled from the Data manifold Levina and Bickel (2005;Fan et al, 2009;Einbeck and Kalantana, 2013;Rozza et al, 2011).…”
Section: Introductionmentioning
confidence: 99%
“…3.1 and 3.2, respectively, extending and formalizing preliminary ideas given in Ref. 17. We make the initial decision to normalize all columns of the dataset so that each of the M variables has a sample mean of zero and a sample standard deviation of 1.…”
Section: Proposed Intrinsic Dimension Estimation Methodsmentioning
confidence: 99%
“…Due to an increased interest in dimensionality reduction and manifold learning, a lot of techniques have been proposed in order to estimate the intrinsic dimensionality of a data set (Camastra, 2003;Brand, 2003;Costa and Hero, 2004;Kégl, 2003;Hein and Audibert, 2005;Levina and Bickel, 2005;Weinberger and Saul, 2006;Qiao and Zhang, 2009;Yata and Aoshima, 2010;Mo and Huang, 2012;Fan et al, 2013;Einbeck and Kalantan, 2013;He et al, 2014).…”
mentioning
confidence: 99%
“…Techniques for intrinsic dimensionality estimation can be divided into two main groups (van der Maaten, 2007;Einbeck and Kalantan, 2013): (1) estimators based on the analysis of local properties of the data (the correlation dimension estimator (Grassberger and Optimization of the maximum likelihood estimator for determining the intrinsic dimensionality. .…”
mentioning
confidence: 99%