2019
DOI: 10.1080/10095020.2019.1621549
|View full text |Cite
|
Sign up to set email alerts
|

A Principal Component Analysis (PCA)-based framework for automated variable selection in geodemographic classification

Abstract: A geodemographic classification aims to describe the most salient characteristics of a small area zonal geography. However, such representations are influenced by the methodological choices made during their construction. Of particular debate are the choice and specification of input variables, with the objective of identifying inputs that add value but also aim for model parsimony. Within this context, our paper introduces a principal component analysis (PCA)-based automated variable selection methodology tha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 34 publications
(19 citation statements)
references
References 19 publications
0
17
0
Order By: Relevance
“…6.1 Selecting the Appropriate Indicators Liu et al (2019) explain that the selection of variables must consider theoretical aspects, previous experiences, practicality, availability, and statistical and data properties. The latter includes eliminating outliers and poorly correlated indicators, which can compromise the cluster's structure and consistency (Kim, 2015).…”
Section: How To Reduce the K-means Shortcomings?mentioning
confidence: 99%
“…6.1 Selecting the Appropriate Indicators Liu et al (2019) explain that the selection of variables must consider theoretical aspects, previous experiences, practicality, availability, and statistical and data properties. The latter includes eliminating outliers and poorly correlated indicators, which can compromise the cluster's structure and consistency (Kim, 2015).…”
Section: How To Reduce the K-means Shortcomings?mentioning
confidence: 99%
“…In this study, the above procedure was conducted for SLPs for 1-to 3-day lags and the PCs of each lag were used as model input. e PCA has been found as an efficient method for the selection of inputs from a large dataset in previous studies [57,58].…”
Section: Generation Of Model Inputsmentioning
confidence: 99%
“…For example, the clustering algorithms that underpin the development of the classifications will identify the 'best' groupings of the small-area geographies based on a mathematical optimisation process; however, the areas could have been grouped differently based on different input variables [37,43]. Therefore, it is critical that input variables are selected for their ability to generate relevant and meaningful groups, which has led to renewed exploration of variable selection procedures [44]. Yet, despite this, no examples in either the academic or the commercial literature have been found which suggest that any methods are employed at this stage for evaluating the contextual relevance or appropriateness of the input variables chosen.…”
Section: Proposed Alternatives To the Traditional Geodemographic Classification Development Frameworkmentioning
confidence: 99%