2008
DOI: 10.1080/00273170701836695
|View full text |Cite
|
Sign up to set email alerts
|

A New Variable Weighting and Selection Procedure for K-means Cluster Analysis

Abstract: A variance-to-range ratio variable weighting procedure is proposed. We show how this weighting method is theoretically grounded in the inherent variability found in data exhibiting cluster structure. In addition, a variable selection procedure is proposed to operate in conjunction with the variable weighting technique. The performances of these procedures are demonstrated in a simulation study, showing favorable results when compared with existing standardization methods. A detailed demonstration of the weight… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

4
72
0
1

Year Published

2011
2011
2024
2024

Publication Types

Select...
5
3
1
1

Relationship

1
9

Authors

Journals

citations
Cited by 70 publications
(77 citation statements)
references
References 31 publications
4
72
0
1
Order By: Relevance
“…It is therefore worthwhile carefully selecting variables to be included in the segmentation base, rather than including an entire question battery by default. Noisy variables in the segmentation base can be avoided by (1) identifying and removing them after data collection (Brusco and Cradit 2001;Carmone, Kara, and Maxwell 1999;Steinley and Brusco 2008) or by (2) ensuring, before data collection, that survey questions are only included if they contain relevant information, as advocated by Rossiter (2002Rossiter ( , 2011. Methods for identifying and removing can either be employed before the clustering using characteristics of the distribution of the single variables (Steinley and Brusco 2008) or simultaneously during clustering, by taking into account the concordance and agreement between cluster solutions implied by different variables (see Brusco and Cradit 2001, who directly build on and improve Carmone, Kara, and Maxwell 1999).…”
Section: Discussionmentioning
confidence: 99%
“…It is therefore worthwhile carefully selecting variables to be included in the segmentation base, rather than including an entire question battery by default. Noisy variables in the segmentation base can be avoided by (1) identifying and removing them after data collection (Brusco and Cradit 2001;Carmone, Kara, and Maxwell 1999;Steinley and Brusco 2008) or by (2) ensuring, before data collection, that survey questions are only included if they contain relevant information, as advocated by Rossiter (2002Rossiter ( , 2011. Methods for identifying and removing can either be employed before the clustering using characteristics of the distribution of the single variables (Steinley and Brusco 2008) or simultaneously during clustering, by taking into account the concordance and agreement between cluster solutions implied by different variables (see Brusco and Cradit 2001, who directly build on and improve Carmone, Kara, and Maxwell 1999).…”
Section: Discussionmentioning
confidence: 99%
“…For this any clustering algorithm on multivariate data can be used. Since the time complexity of this step depends on the number of observation N and of variables P we decide to screen out irrelevant features using a feature selection algorithm for unsupervised learning described in [33]. Besides the reduction of the computation time, feature selection allows also to gain in interpretability of the clustering since it highly depends on the data.…”
Section: Clustering Electrical Load Curvesmentioning
confidence: 99%
“…Segundo Steinley & Brusco (2008a) e Anzanello & Fogliatto (2011), uma maior variância sugere variáveis mais dispersas e, em consequência, com maior capacidade de diferenciarem observações em grupos quando comparadas a variáveis com menores variâncias. O número de componentes a ser retido pode ser definido através do Scree Graph ou de validação cruzada (Duda et al, 2001).…”
Section: Passo 2 -Aplicação Da Acp Nos Dados Remapeados E Geração Do unclassified