Spectral Clustering Using PCKID – A Probabilistic Cluster Kernel for Incomplete Data

Løkse, Sigurd; Bianchi, Filippo Maria; Salberg, Arnt-Børre; Jenssen, Robert

doi:10.1007/978-3-319-59126-1_36

Cited by 6 publications

(4 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, Souto et al [33] later argued that from their experiments, this superiority is non-existent, backing up their conclusion with the rationale that gene expression being highly correlated and characterized by very close values, imputing with a mean will have minimal effect on the shape of the data's distribution. Løkse et al [34] introduced a new kernel function which learns the similarities between data points from the data's fitted mixture models, inherently taking care of the missing value problem. They then use this kernel function for spectral clustering, performing kmeans clustering on the spectral clustering output.…”

Section: Multi-stage Clusteringmentioning

confidence: 99%

Cluster Analysis of Mixed and Missing Chronic Kidney Disease Data in KwaZulu-Natal Province, South Africa

2021

View full text Add to dashboard Cite

show abstract

Section: Multi-stage Clusteringmentioning

confidence: 99%

Cluster Analysis of Mixed and Missing Chronic Kidney Disease Data in KwaZulu-Natal Province, South Africa

2021

View full text Add to dashboard Cite

show abstract

“…The PCK has previously been used for semi-supervised learning [22] and spectral clustering [23]. Additionally, variations of the method for handling missing data have been proposed for both time series [38] and vectorial data [35].…”

Section: Probabilistic Cluster Kernelmentioning

confidence: 99%

“…We train PCK by fitting GMMs on a subset of 200 training samples using parameters Q = G = 30. These parameters are sufficiently large to ensure robust results [35]. Once trained, the GMM models are applied to the remaining data to calculate the whole kernel matrix.…”

Section: Experimental Settingmentioning

confidence: 99%

The deep kernelized autoencoder

Kampffmeyer¹,

Løkse²,

Bianchi³

et al. 2018

Applied Soft Computing

Self Cite

View full text Add to dashboard Cite

Autoencoders learn data representations (codes) in such a way that the input is reproduced at the output of the network. However, it is not always clear what kind of properties of the input data need to be captured by the codes. Kernel machines have experienced great success by operating via inner-products in a theoretically well-defined reproducing kernel Hilbert space, hence capturing topological properties of input data. In this paper, we enhance the autoencoder's ability to learn effective data representations by aligning inner products between codes with respect to a kernel matrix. By doing so, the proposed kernelized autoencoder allows learning similarity-preserving embeddings of input data, where the notion of similarity is explicitly controlled by the user and encoded in a positive semi-definite kernel matrix. Experiments are performed for evaluating both reconstruction and kernel alignment performance in classification tasks and visualization of high-dimensional data. Additionally, we show that our method is capable to emulate kernel principal component analysis on a denoising task, obtaining competitive results at a much lower computational cost.

show abstract

“…By doing so, the entire framework is grounded within the theoretically well understood kernel methods. Moreover, spectral clustering is considered a state-of-the-art clustering algorithm and has been successfully utilized in many applications [25,26,27].…”

Section: Introductionmentioning

confidence: 99%

An Unsupervised Multivariate Time Series Kernel Approach for Identifying Patients with Surgical Site Infection from Blood Samples

Mikalsen¹,

Soguero-Ruíz²,

Bianchi³

et al. 2018

Preprint

Self Cite

View full text Add to dashboard Cite

A large fraction of the electronic health records consists of clinical measurements collected over time, such as blood tests, which provide important information about the health status of a patient. These sequences of clinical measurements are naturally represented as time series, characterized by multiple variables and the presence of missing data, which complicate analysis. In this work, we propose a surgical site infection detection framework for patients undergoing colorectal cancer surgery that is completely unsupervised, hence alleviating the problem of getting access to labelled training data. The framework is based on powerful kernels for multivariate time series that account for missing data when computing similarities. Our approach show superior performance compared to baselines that have to resort to imputation techniques and performs comparable to a supervised classification baseline.

show abstract

Spectral Clustering Using PCKID – A Probabilistic Cluster Kernel for Incomplete Data

Cited by 6 publications

References 20 publications

Cluster Analysis of Mixed and Missing Chronic Kidney Disease Data in KwaZulu-Natal Province, South Africa

Cluster Analysis of Mixed and Missing Chronic Kidney Disease Data in KwaZulu-Natal Province, South Africa

The deep kernelized autoencoder

An Unsupervised Multivariate Time Series Kernel Approach for Identifying Patients with Surgical Site Infection from Blood Samples

Contact Info

Product

Resources

About