Robust and sparse k-means clustering for high-dimensional data

Brodinova, Sarka; Filzmoser, Peter; Ortner, Thomas; Breiteneder, Christian; Rohm, Maia

doi:10.1007/s11634-019-00356-9

Cited by 30 publications

(52 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Estimating the number of clusters in a data set is challenging; however, methods such as the gap statistic can be added to the workflow for choosing the number of clusters ( Tibshirani et al, 2001 ). Additionally, recent approaches to clustering, such as robust (weighted) sparse k -mean clustering, have the advantage of simultaneously identifying clusters and informative features for partitioning the data that can be used in feature selection ( Brodinová et al, 2019 ). Finally, growth mixture models for cluster analysis of longitudinal data may be more suitable for data analysis from studies that include a series of sequential measurements of cortical development ( Wei et al, 2017 ).…”

Section: Discussionmentioning

confidence: 99%

A Primer on Constructing Plasticity Phenotypes to Classify Experience-Dependent Development of the Visual Cortex

Balsor

Ahuja

Jones³

et al. 2020

Front. Cell. Neurosci.

View full text Add to dashboard Cite

Many neural mechanisms regulate experience-dependent plasticity in the visual cortex (V1), and new techniques for quantifying large numbers of proteins or genes are transforming how plasticity is studied into the era of big data. With those large data sets comes the challenge of extracting biologically meaningful results about visual plasticity from data-driven analytical methods designed for high-dimensional data. In other areas of neuroscience, high-information content methodologies are revealing more subtle aspects of neural development and individual variations that give rise to a richer picture of brain disorders. We have developed an approach for studying V1 plasticity that takes advantage of the known functions of many synaptic proteins for regulating visual plasticity. We use that knowledge to rebrand protein measurements into plasticity features and combine those into a plasticity phenotype. Here, we provide a primer for analyzing experience-dependent plasticity in V1 using example R code to identify high-dimensional changes in a group of proteins. We describe using PCA to classify high-dimensional plasticity features and use them to construct a plasticity phenotype. In the examples, we show how to use this analytical framework to study and compare experience-dependent development and plasticity of V1 and apply the plasticity phenotype to translational research questions. We include an R package “PlasticityPhenotypes” that aggregates the coding packages and custom code written in RStudio to construct and analyze plasticity phenotypes.

show abstract

Section: Discussionmentioning

confidence: 99%

A Primer on Constructing Plasticity Phenotypes to Classify Experience-Dependent Development of the Visual Cortex

Balsor

Ahuja

Jones³

et al. 2020

Front. Cell. Neurosci.

View full text Add to dashboard Cite

show abstract

“…Nonetheless, clustering methods are advancing and therefore we do not advocate that ascendant hierarchical clustering is the only method applied on further datasets. For example, a recent paper advances upon k-means clustering to account for outliers and noise variables (Brodinová et al., 2019). As always with analysis of datasets, it is necessary to explore the available tools to find an appropriate choice.…”

Section: Discussionmentioning

confidence: 99%

Moving Beyond Simple Descriptive Statistics in the Analysis of Online Wildlife Trade: An Example From Clustering and Ordination

Lee

Roberts

2020

Tropical Conservation Science

View full text Add to dashboard Cite

Collecting data for reports on online wildlife trade is resource-intensive and time-consuming. Learning often focuses on the main item traded by each country only. However, online trade is increasing, providing potential to update the conversation from a national scale to a global scale. We demonstrate how hierarchical clustering can identify wildlife items that follow similar trading patterns. We also ordinate the clusters, and seek correlations between the clusters and global measures, such as Worldwide Governance Indicators. We primarily use a sample dataset from a published report of online traded wildlife, covering 16 countries and 31 taxa or product types. Clustering provided immediate insights, such as rhinos and pangolins were traded similarly to ivory and suspected ivory. Five out of eight clusters represented items predominately traded by one country. An ordination of these clusters, and representation of global measures on the ordination axis, show a strong correlation of the ‘Voice and accountability' score with the clusters. Consequently, from the ‘Voice and accountability' score of the United States, a country not included in our dataset, we inferred that it traded elephant items (not ivory) and owl items during 2014.

show abstract

“…QE is the average distance between each node and its best matching unit (BMU), while TE measures the wellness of the map structure by calculating the node's first and second BMUs and their position in relation to each other (Villmann et al, 1997;Kohonen, 2001;Breard, 2017). Smaller QE and TE values indicate a better fit of the map itself (Kohonen, 2001;Breard, 2017). Once the SOM has been trained, the data was visualized into a U-matrix (unified distance matrix) along with eight component planes.…”

Section: Self-organizing Mapsmentioning

confidence: 99%

“…In this case, a gap statistic method was used. The gap statistic evaluates the dataset and provides the highest possible number of clusters suitable for the analysis (Tibshirani et al, 2001;Brodinová et al, 2019). After the gap value was calculated, the accurate k value was, then, applied to the k-means method.…”

Section: Principal Component Analysis (Pca) and Cluster Analysismentioning

confidence: 99%

Increased Transparency and Resource Prioritization for the Management of Pollutants From Wastewater Treatment Plants: A National Perspective From Australia

Rohmana

Fischer

Cumming³

et al. 2020

Front. Mar. Sci.

View full text Add to dashboard Cite

With increasing human populations in coastal regions, there is growing concern over the quality of wastewater treatment plant (WTP) discharge and its impacts on coastal biodiversity, recreational amenities, and human health. In Australia, the current system of WTP monitoring and reporting varies across states and jurisdictions leading to a lack of data transparency and accountability, leading to a reduced ability to comprehensively assess regional and national scale biodiversity impacts and health risks. The National Outfall Database (NOD) was developed to provide a centralized spatial data management system for sharing and communicating comprehensive, national-scale WTP pollutant data. This research describes the structure of the NOD and through self-organizing maps and principal component analysis, provides a comprehensive, national-scale analysis of WTP effluent. Such a broad understanding of the constituents and level of pollutants in coastal WTP effluent within a public database provides for improved transparency and accountability and an opportunity to evaluate health risks and develop national water quality standards.

show abstract

Robust and sparse k-means clustering for high-dimensional data

Cited by 30 publications

References 34 publications

A Primer on Constructing Plasticity Phenotypes to Classify Experience-Dependent Development of the Visual Cortex

A Primer on Constructing Plasticity Phenotypes to Classify Experience-Dependent Development of the Visual Cortex

Moving Beyond Simple Descriptive Statistics in the Analysis of Online Wildlife Trade: An Example From Clustering and Ordination

Increased Transparency and Resource Prioritization for the Management of Pollutants From Wastewater Treatment Plants: A National Perspective From Australia

Contact Info

Product

Resources

About