2020
DOI: 10.1038/s41598-020-72222-0
|View full text |Cite
|
Sign up to set email alerts
|

Data segmentation based on the local intrinsic dimension

Abstract: One of the founding paradigms of machine learning is that a small number of variables is often sufficient to describe high-dimensional data. The minimum number of variables required is called the intrinsic dimension (ID) of the data. Contrary to common intuition, there are cases where the ID varies within the same data set. This fact has been highlighted in technical discussions, but seldom exploited to analyze large data sets and obtain insight into their structure. Here we develop a robust approach to discri… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
29
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 23 publications
(30 citation statements)
references
References 38 publications
(75 reference statements)
1
29
0
Order By: Relevance
“…Further assessment of the statistical validity of taking separated centroids as representative of ecological clustering in a PCA setting requires additional work (possibly using recent and promising methods based on the evaluation of the local intrinsic dimension of the data (Allegra et al. 2020 )).
Fig.
…”
Section: Resultsmentioning
confidence: 99%
“…Further assessment of the statistical validity of taking separated centroids as representative of ecological clustering in a PCA setting requires additional work (possibly using recent and promising methods based on the evaluation of the local intrinsic dimension of the data (Allegra et al. 2020 )).
Fig.
…”
Section: Resultsmentioning
confidence: 99%
“…However these results were derived for the simplest uniform euclidean manifold with single global intrinsic dimension, they form a base for application in more complex cases. For example the pdf of the local statistic make possible to apply the FSA estimator within mixture-based approaches, this would provide better ID estimates when the ID is varying in the data set ( Haro, Randall & Sapiro, 2008 ; Allegra et al, 2020 ).…”
Section: Discussionmentioning
confidence: 99%
“…In 3D, this approach can be used for object detection (see Figure 1), but it can be generalized for higher-dimensional data point clouds. Interestingly, local ID can be related to various object characteristics in various domains: folded versus unfolded configurations in a protein molecular dynamics trajectory, active versus non-active regions in brain imaging data, and firms with different financial risk in company balance sheets [74].…”
Section: Discussionmentioning
confidence: 99%