2019
DOI: 10.1038/s41598-019-53549-9
|View full text |Cite
|
Sign up to set email alerts
|

Intrinsic dimension estimation for locally undersampled data

Abstract: Identifying the minimal number of parameters needed to describe a dataset is a challenging problem known in the literature as intrinsic dimension estimation. All the existing intrinsic dimension estimators are not reliable whenever the dataset is locally undersampled, and this is at the core of the so called curse of dimensionality. Here we introduce a new intrinsic dimension estimator that leverages on simple properties of the tangent space of a manifold and extends the usual correlation integral estimator to… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
24
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 31 publications
(29 citation statements)
references
References 26 publications
(42 reference statements)
0
24
0
Order By: Relevance
“…Recently, a useful parameterization of object manifolds was introduced that is amenable to analytical computations [ 28 ]; it will be described in detail below. In a data science perspective, these approaches are motivated by the empirical observation that data sets usually lie on low-dimensional manifolds, whose “intrinsic dimension” is a measure of the number of latent degrees of freedom [ 29 , 30 , 31 ].…”
Section: Introductionmentioning
confidence: 99%
“…Recently, a useful parameterization of object manifolds was introduced that is amenable to analytical computations [ 28 ]; it will be described in detail below. In a data science perspective, these approaches are motivated by the empirical observation that data sets usually lie on low-dimensional manifolds, whose “intrinsic dimension” is a measure of the number of latent degrees of freedom [ 29 , 30 , 31 ].…”
Section: Introductionmentioning
confidence: 99%
“…In order to compare the conformational space originated by the different FFs via MD simulations, one can consider to (i) use a similarity kernel on the average SOAP descriptors calculated over the whole trajectory, as we explained in the sections above, or (ii) estimate the probability densities resulting from the ensemble of the environments’ SOAP power spectra over a reduced representation and calculate a distributional distance over such densities. To obtain a comparison of type (ii), we took the original SOAP spectra and we evaluated its intrinsic dimension via the TwoNN algorithm 44 and with FCI algorithm, 45 obtaining, respectively, an estimation of 24 and 25 dimensions, using a 3 nm cutoff. The first point was to make the gridding computationally feasible (a 2700-dimensional grid is out of reach for the current computational capabilities).…”
Section: Methodsmentioning
confidence: 99%
“…In order to compare the conformational space originated by the di↵erent force fields via MD simulations, one can consider to: (i) use a similarity kernel on the average SOAP descriptors calculated over the whole trajectory, as we explained in the sections above; or (ii) estimate the probability densities resulting from the ensemble of the environments' SOAP power spectra over a reduced representation, and calculate a distributional distance over such densities. To obtain a comparison of type (ii), we started estimating the intrinsic dimension of the SOAP spectra dataset, via the TwoNN algorithm 63 and with FCI algorithm 64 and it was estimated to be larger than 20, for the chosen 3nm cut-o↵. To make the gridding computationally feasible, we first employed PCA to reduce the dimensionality of the dataset, and used the Pak algorithm 65 to obtain a uniform grid (details in the SI).…”
Section: Soap Comparisonmentioning
confidence: 99%
“…9 Despite the tremendous advance in computational capabilities observed in the last decades, classical MD at atomistic resolution is still unable to cover all the time scales of biological interest. 10 For this reason, starting from the beginning of the 90s, various models with a reduced number of degrees of freedom were proposed, from the united atom (UA) representations, where the aliphatic hydrogen atoms are removed and their mass is added to the bound heavy atom, 11 to coarse-grained (CG), where a single "CG bead" formed by usually 2-5 heavy atoms, 12 to super-CG models, where a single lipid can be represented by 3-4 larger CG beads. 13 The reduction of the number of degrees of freedom provides a dramatic speed up in the simulations, which is nonetheless accompanied by an unavoidable loss of entropic contribution (and thus accuracy), typically compensated by properly compensating the enthalpic contributions.…”
mentioning
confidence: 99%
See 1 more Smart Citation