Mahalanobis distance informed by clustering

Lahav, Almog; Talmon, Ronen; Kluger, Yuval

doi:10.1093/imaiai/iay011

Cited by 8 publications

(4 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…categorical variable, then a two-stage cluster analysis is recommended. 3 It is especially important to take into account whether there are nonstandard observations and whether they should be excluded, as well as whether standardization of variables is required. Non-standard observations can be observations that are not representative of the population, but that are representative of the specific sample and research problem.…”

Section: Research Design In Cluster Analysismentioning

confidence: 99%

Cluster methods in function better selection

Mijanović

2024

MOJSM

View full text Add to dashboard Cite

Cluster analysis methods, also known as taxonomic methods, are intended for grouping objects and subjects according to certain characteristics, attributes and properties. Cluster analysis looks at relevant objects and attributes, classifying them into two or more independent groups. Cluster analysis supplemented with discriminant analysis is used in confirmatory and fundamental research. In numerous statistical-methodological procedures, these methods are applied when setting up and testing various hypotheses. Grouping methods are particularly useful in the process of different selections with the aim of forming coherent groups, which may or may not necessarily be statistically different. There are several models of clustering (grouping), always with one goal, which is greater proximity (similarity) of an entity belonging to a group compared to an entity belonging to another group. Two basic grouping models are recognizable, Hierarchical and Non-Hierarchical. Both models have the same goal, which is the formation of several independent homogeneous groups from one common group of entities. The hierarchical approach does not define the number of clusters in advance (a priori), in contrast to the Non-Hierarchical Model which defines in advance number of clusters. The grouping model is chosen depending on the specific problem and the set goal of grouping. In the process, several different models are often applied, and then one is chosen as in this research. It is important to point out that the theoretical number of clusters (groups) is often not realistically applicable in practice. Using the example of this research, it was proven that the first grouping was not a good solution. Through the subsequent, second and third iteration, as well as the application of additional discriminative methods, three optimal clusters were determined in the population of girls and boys. Satisfactory optimal grouping was obtained on the basis of gender criteria and achieved results on psycho-motor tests.

show abstract

Section: Research Design In Cluster Analysismentioning

confidence: 99%

Cluster methods in function better selection

Mijanović

2024

MOJSM

View full text Add to dashboard Cite

show abstract

Section: Research Design In Cluster Analysismentioning

confidence: 99%

Untitled

2024

MOJSM

View full text Add to dashboard Cite

Cluster analysis methods, also known as taxonomic methods, are intended for grouping objects and subjects according to certain characteristics, attributes and properties. Cluster analysis looks at relevant objects and attributes, classifying them into two or more independent groups. Cluster analysis supplemented with discriminant analysis is used in confirmatory and fundamental research. In numerous statistical-methodological procedures, these methods are applied when setting up and testing various hypotheses. Grouping methods are particularly useful in the process of different selections with the aim of forming coherent groups, which may or may not necessarily be statistically different. There are several models of clustering (grouping), always with one goal, which is greater proximity (similarity) of an entity belonging to a group compared to an entity belonging to another group. Two basic grouping models are recognizable, Hierarchical and Non-Hierarchical. Both models have the same goal, which is the formation of several independent homogeneous groups from one common group of entities. The hierarchical approach does not define the number of clusters in advance (a priori), in contrast to the Non-Hierarchical Model which defines in advance number of clusters. The grouping model is chosen depending on the specific problem and the set goal of grouping. In the process, several different models are often applied, and then one is chosen as in this research.It is important to point out that the theoretical number of clusters (groups) is often not realistically applicable in practice. Using the example of this research, it was proven that the first grouping was not a good solution. Through the subsequent, second and third iteration, as well as the application of additional discriminative methods, three optimal clusters were determined in the population of girls and boys. Satisfactory optimal grouping was obtained on the basis of gender criteria and achieved results on psycho-motor tests.

show abstract

“…The Moore-Penrose pseudo-inverse W − is commonly used in cases where the covariance matrix is not invertible, see Wei et al [41] and Lahav et al [22], for example. This pseudo-inverse is constructed using the nonzero eigenvalues and corresponding eigenvectors of the covariance matrix W , and satisfies the four Moore-Penrose conditions [17].…”

Section: Introductionmentioning

confidence: 99%

“…Efficiencies (21) and(22) for different k, with three different sets of eigenvalues of the covariance matrix W as given in Table1in Sect. 3.1.…”

mentioning

confidence: 99%

Simplicial and Minimal-Variance Distances in Multivariate Data Analysis

Gillard

O’Riordan

Zhigljavsky

2022

J Stat Theory Pract

View full text Add to dashboard Cite

In this paper, we study the behaviour of the so-called k-simplicial distances and k-minimal-variance distances between a point and a sample. The family of k-simplicial distances includes the Euclidean distance, the Mahalanobis distance, Oja’s simplex distance and many others. We give recommendations about the choice of parameters used to calculate the distances, including the size of the sub-sample of simplices used to improve computation time, if needed. We introduce a new family of distances which we call k-minimal-variance distances. Each of these distances is constructed using polynomials in the sample covariance matrix, with the aim of providing an alternative to the inverse covariance matrix, that is applicable when data is degenerate. We explore some applications of the considered distances, including outlier detection and clustering, and compare how the behaviour of the distances is affected for different parameter choices.

show abstract

Mahalanobis distance informed by clustering

Cited by 8 publications

References 24 publications

Cluster methods in function better selection

Cluster methods in function better selection

Untitled

Simplicial and Minimal-Variance Distances in Multivariate Data Analysis

Contact Info

Product

Resources

About