2014
DOI: 10.1111/exsy.12082
|View full text |Cite
|
Sign up to set email alerts
|

Feature selection for clustering categorical data with an embedded modelling approach

Abstract: Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modificati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 38 publications
(12 citation statements)
references
References 30 publications
(42 reference statements)
0
11
0
Order By: Relevance
“…Coding criteria usually are used for comparing two models (like AIC or BIC criteria). Silvestre et al (2015) showed how to apply the MML criterion simultaneously with a clustering method. This is similar to our algorithm, which reduces redundant clusters on-line.…”
Section: Model Selection Criteriamentioning
confidence: 99%
“…Coding criteria usually are used for comparing two models (like AIC or BIC criteria). Silvestre et al (2015) showed how to apply the MML criterion simultaneously with a clustering method. This is similar to our algorithm, which reduces redundant clusters on-line.…”
Section: Model Selection Criteriamentioning
confidence: 99%
“…On the other hand, the selection of features that minimize redundancy is superior to feature reduction in terms of interpretability (Alelyani Salem et al, 2013) and performance (Ronan et al, 2016). Problems like the one our method is concerned with, binary feature selection for clustering, have rarely been addressed though, while most of the studies have focussed on numerical variables (Silvestre, Cardoso, & Figueiredo, 2015). To our knowledge, only a handful of works did explore clustering in the presence of categorical (thus also binary, in particular) data (Bontemps & Toussile, 2013;Silvestre et al, 2015).…”
Section: On the Strengths And Limitations Of The Algorithmmentioning
confidence: 99%
“…Problems like the one our method is concerned with, binary feature selection for clustering, have rarely been addressed though, while most of the studies have focussed on numerical variables (Silvestre, Cardoso, & Figueiredo, 2015). To our knowledge, only a handful of works did explore clustering in the presence of categorical (thus also binary, in particular) data (Bontemps & Toussile, 2013;Silvestre et al, 2015). The methods therein developed make certain assumptions on the data and only solve the feature selection problem by simultaneously targeting a distribution in the desired number of clusters, which would not straightforwardly align with the rest of the pipeline in our algorithm.…”
Section: On the Strengths And Limitations Of The Algorithmmentioning
confidence: 99%
“…In 'Feature selection for clustering categorical data with an embedded modelling approach', Silvestre et al (2014) present a novel approach that simultaneously clusters categorical data and selects relevant features. The approach is based on a Gaussian mixture model, where the minimum message length criterion is used to guide the selection of the relevant features and a modified expectation-maximization algorithm estimates the model parameters.…”
Section: Contents Of the Special Issuementioning
confidence: 99%