One of the founding paradigms of machine learning is that a small number of variables is often sufficient to describe high-dimensional data. The minimum number of variables required is called the intrinsic dimension (ID) of the data. Contrary to common intuition, there are cases where the ID varies within the same data set. This fact has been highlighted in technical discussions, but seldom exploited to analyze large data sets and obtain insight into their structure. Here we develop a robust approach to discriminate regions with different local IDs and segment the points accordingly. Our approach is computationally efficient and can be proficiently used even on large data sets. We find that many real-world data sets contain regions with widely heterogeneous dimensions. These regions host points differing in core properties: folded versus unfolded configurations in a protein molecular dynamics trajectory, active versus non-active regions in brain imaging data, and firms with different financial risk in company balance sheets. A simple topological feature, the local ID, is thus sufficient to achieve an unsupervised segmentation of high-dimensional data, complementary to the one given by clustering algorithms.
Modern datasets are characterized by numerous features related by complex dependency structures. To deal with these data, dimensionality reduction techniques are essential. Many of these techniques rely on the concept of intrinsic dimension (), a measure of the complexity of the dataset. However, the estimation of this quantity is not trivial: often, the depends rather dramatically on the scale of the distances among data points. At short distances, the can be grossly overestimated due to the presence of noise, becoming smaller and approximately scale-independent only at large distances. An immediate approach to examining the scale dependence consists in decimating the dataset, which unavoidably induces non-negligible statistical errors at large scale. This article introduces a novel statistical method, , that allows estimating the as an explicit function of the scale without performing any decimation. Our approach is based on rigorous distributional results that enable the quantification of uncertainty of the estimates. Moreover, our method is simple and computationally efficient since it relies only on the distances among data points. Through simulation studies, we show that is asymptotically unbiased, provides comparable estimates to other state-of-the-art methods, and is more robust to short-scale noise than other likelihood-based approaches.
Objectives The main purpose of this study was to describe the relationship between patellar maximal craniocaudal thickness and femoral trochlear groove depth in normal dogs and to valuate the intra-observer or inter-observer variability of maximal trochlear depth and maximal patellar craniocaudal thickness using computed tomography.
Methods Trochlear groove depth and patellar maximal craniocaudal thickness of 40 limbs (20 dogs) were measured by three independent veterinarians using three-dimensional multiplanar reconstruction computed tomography images. The patellar maximal craniocaudal thickness/trochlear depth ratio was determined.
Results The mean ratio of these stifles was 0.46 (range 0.24–0.70), meaning that the mean maximal depth of the trochlea was 46% of the mean maximal-patellar thickness.
Clinical Significance A wide range of maximal–patellar–craniocaudal–thickness/maximal trochlear-depth ratio was found suggesting that breed studies should be performed to determine a breed-specific patellar-thickness/trochlear-depth ratio. To make decisions on when and where to perform a sulcoplasty during patellar luxation surgery, patella/trochlea thickness relationship should be measured for each breed with patellar tracking from stifle hyperflexion to stifle hyperextension.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.