2017
DOI: 10.1038/s41598-017-11873-y
|View full text |Cite
|
Sign up to set email alerts
|

Estimating the intrinsic dimension of datasets by a minimal neighborhood information

Abstract: Analyzing large volumes of high-dimensional data is an issue of fundamental importance in data science, molecular simulations and beyond. Several approaches work on the assumption that the important content of a dataset belongs to a manifold whose Intrinsic Dimension (ID) is much lower than the crude large number of coordinates. Such manifold is generally twisted and curved; in addition points on it will be non-uniformly distributed: two factors that make the identification of the ID and its exploitation reall… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

8
286
0
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 225 publications
(314 citation statements)
references
References 22 publications
8
286
0
1
Order By: Relevance
“…On the other hand, the dynamics of typical biomolecular systems such as proteins evolves in a high-dimensional phase space with an effective dimension that has been estimated to be between 5 and 10 (given optimal collective coordinates x i that are usually not known). [44][45][46] Hence, even for appropriate dimensionality reduction and large data sets (N ∼ 10 7 ), the problem outlined in Fig. 1 is hard to avoid.…”
Section: Introductionmentioning
confidence: 99%
“…On the other hand, the dynamics of typical biomolecular systems such as proteins evolves in a high-dimensional phase space with an effective dimension that has been estimated to be between 5 and 10 (given optimal collective coordinates x i that are usually not known). [44][45][46] Hence, even for appropriate dimensionality reduction and large data sets (N ∼ 10 7 ), the problem outlined in Fig. 1 is hard to avoid.…”
Section: Introductionmentioning
confidence: 99%
“…We have 1 + δ 2 + (1 + δ 2 ) 2 − 1 ≥ 1 + √ 2δ, which leads to where we have used concavity of the log and the fact that 1 + √ 2 ≥ 2. It follows by combining (20), (21), (22) and (23), that Q : y → R( y 2 ) satisfies the claimed properties.…”
Section: Lemma 16mentioning
confidence: 77%
“…where d is the intrinsic dimension (ID) of the dataset 46 , is the volume of the d-sphere with unitary radius and , is the distance of point i from its k-th nearest neighbor. In DPC it is the density rank which is relevant for the final cluster assignation.…”
Section: Block 4: Density Peak Clusteringmentioning
confidence: 99%