Kernel K-Means for Categorical Data

Couto, Julia

doi:10.1007/11552253_5

Cited by 17 publications

(14 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We chose a kernel, proposed in Couto (2005), based on the Hamming distance which measures the minimum number of substitutions required to change one observation into another one. Naturally, pgpEM and kernel k-means worked on the same kernel to have a fair Harmonic 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Harmonic 2 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Harmonic 1 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +…”

Section: Clustering Of Categorical Data: the House-vote Datasetmentioning

confidence: 99%

“…It turns out that pgpEM is significantly better than kernel k-means to cluster this kind of data. To make pgpDA able to deal with such data, we built a combined kernel by mixing a kernel based on the Hamming distance (Couto 2005) for the categorical features and a RBF kernel for the quantitative data. We chose to combine both kernels simply as follows:…”

Section: Pca Function 2 (Percentage Of Variability 125 )mentioning

confidence: 99%

See 1 more Smart Citation

Kernel discriminant analysis and clustering with parsimonious Gaussian process models

Bouveyron¹,

Fauvel²,

Girard³

2014

Stat Comput

View full text Add to dashboard Cite

International audienceThis work presents a family of parsimonious Gaussian process models which allow to build, from a finite sample, a model-based classifier in an infinite dimensional space. The proposed parsimonious models are obtained by constraining the eigen-decomposition of the Gaussian processes modeling each class. This allows in particular to use non-linear mapping functions which project the observations into infinite dimensional spaces. It is also demonstrated that the building of the classifier can be directly done from the observation space through a kernel function. The proposed classification method is thus able to classify data of various types such as categorical data, functional data or networks. Furthermore, it is possible to classify mixed data by combining different kernels. The methodology is as well extended to the unsupervised classification case. Experimental results on various data sets demonstrate the effectiveness of the proposed method

show abstract

Section: Clustering Of Categorical Data: the House-vote Datasetmentioning

confidence: 99%

Section: Pca Function 2 (Percentage Of Variability 125 )mentioning

confidence: 99%

Kernel discriminant analysis and clustering with parsimonious Gaussian process models

Bouveyron¹,

Fauvel²,

Girard³

2014

Stat Comput

View full text Add to dashboard Cite

show abstract

“…For the numerical datasets, the Gaussian kernel is applied, while the Hamming kernel [7] is used for categorical datasets. For each of the three methods, we used a variety of parameter settings.…”

Section: B Quantitative Resultsmentioning

confidence: 99%

“…Most remarkably, the kernel trick adaptation also allows these inner product reliant methods to be directly applied to non-numeric or mixed-type data, once appropriate kernels have been defined for these data types. As examples, here we can mention outlier detection techniques for categorical or mixed-attribute data such as [7] and [8].…”

Section: Introductionmentioning

confidence: 99%

Kernel principal subspace Mahalanobis distances for outlier detection

Liu

Georgiopoulos

Anagnostopoulos

2011

The 2011 International Joint Conference on Neural Networks

View full text Add to dashboard Cite

Abstract-Over the last few years, Kernel Principal Component Analysis (KPCA) has found several applications in outlier detection. A relatively recent method uses KPCA to compute the reconstruction error (RE) of previously unseen samples and, via thresholding, to identify atypical samples. In this paper we propose an alternative method, which performs the same task, but considers Mahalanobis distances in the orthogonal complement of the subspace that is utilized to compute the reconstruction error. In order to illustrate its merits, we provide qualitative and quantitative results on both artificial and real datasets and we show that it is competitive, if not superior, for several outlier detection tasks, when compared to the original RE-based variant and the One-Class SVM detection approach.

show abstract

“…Amir et al [61] offered a cost function and distance measure for clustering datasets with mixed data (datasets with numerical and categorical data) based on co-occurrences of values. In [62], a kernel function based on "hamming distance" [62] was proposed for embedding categorical data. The kernel-k-means provides an add-on to the k-means clustering that is designed to find clusters in a feature space where distances are calculated via kernel functions.…”

Section: K-means Variants For Solving the Problem Of Data Issuementioning

confidence: 99%

The k-means Algorithm: A Comprehensive Survey and Performance Evaluation

2020

View full text Add to dashboard Cite

The k-means clustering algorithm is considered one of the most powerful and popular data mining algorithms in the research community. However, despite its popularity, the algorithm has certain limitations, including problems associated with random initialization of the centroids which leads to unexpected convergence. Additionally, such a clustering algorithm requires the number of clusters to be defined beforehand, which is responsible for different cluster shapes and outlier effects. A fundamental problem of the k-means algorithm is its inability to handle various data types. This paper provides a structured and synoptic overview of research conducted on the k-means algorithm to overcome such shortcomings. Variants of the k-means algorithms including their recent developments are discussed, where their effectiveness is investigated based on the experimental analysis of a variety of datasets. The detailed experimental analysis along with a thorough comparison among different k-means clustering algorithms differentiates our work compared to other existing survey papers. Furthermore, it outlines a clear and thorough understanding of the k-means algorithm along with its different research directions.

show abstract

Kernel K-Means for Categorical Data

Cited by 17 publications

References 9 publications

Kernel discriminant analysis and clustering with parsimonious Gaussian process models

Kernel discriminant analysis and clustering with parsimonious Gaussian process models

Kernel principal subspace Mahalanobis distances for outlier detection

The k-means Algorithm: A Comprehensive Survey and Performance Evaluation

Contact Info

Product

Resources

About