Clustering Large Datasets by Merging K-Means Solutions

Melnykov, Volodymyr; Michael, Semhar

doi:10.1007/s00357-019-09314-8

Cited by 15 publications

(10 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The K-means algorithm is one of the most popular hierarchical algorithms and uses the minimum sum of squares to assign observations to groups. Such groups of data points are called clusters [46,47]. Observations allocated to the closest cluster, and the distance between an observation and a cluster is calculated from the Euclidean distance between the observation and the cluster center.…”

Section: Methodsmentioning

confidence: 99%

Cross-Shore Profile Evolution after an Extreme Erosion Event—Palanga, Lithuania

Kelpšaitė-Rimkienė

Parnell

Žaromskis

et al. 2021

JMSE

View full text Add to dashboard Cite

We report cross-shore profile evolution at Palanga, eastern Baltic Sea, where short period waves dominate. Cross-shore profile studies began directly after a significant coastal erosion event caused by storm “Anatol”, in December of 1999, and continued for a year. Further measurements were undertaken sixteen years later. Cross-shore profile changes were described, and cross-shore transport rates were calculated. A K-means clustering technique was applied to determine sections of the profile with the same development tendencies. Profile evolution was strongly influenced by the depth of closure which is constrained by a moraine layer, and the presence of a groyne. The method used divided the profile into four clusters: the first cluster in the deepest water represents profile evolution limited by the depth of closure, and the second and third are mainly affected by processes induced by wind, wave and water level changes. The most intensive sediment volume changes were observed directly after the coastal erosion event. The largest sand accumulation was in the fourth profile cluster, which includes the upper beach and dunes. Seaward extension of the dune system caused a narrowing of the visible beach, which has led to an increased sand volume (accretion) being misinterpreted as erosion

show abstract

Section: Methodsmentioning

confidence: 99%

Cross-Shore Profile Evolution after an Extreme Erosion Event—Palanga, Lithuania

Kelpšaitė-Rimkienė

Parnell

Žaromskis

et al. 2021

JMSE

View full text Add to dashboard Cite

show abstract

“…Unlike existing robust methods, our proposal identifies general-shaped clusters and tolerates data contamination in a computationally efficient manner. It builds upon existing works based on two-step clustering, where a preliminary model-based algorithm is followed by a hierarchical agglomeration phase [31,34]. It thus inherits their properties but, unlike existing hybrid methods that lack robustness (i.e., only a pre-processing step is proposed in [34]), it can also detect and discard arbitrary forms of contamination.…”

Section: Motivationmentioning

confidence: 99%

Tk-merge: Computationally Efficient Robust Clustering Under General Assumptions

Insolia¹,

Perrotta²

2022

Preprint

View full text Add to dashboard Cite

We address general-shaped clustering problems under very weak parametric assumptions with a two-step hybrid robust clustering algorithm based on trimmed k-means and hierarchical agglomeration. The algorithm has low computational complexity and effectively identifies the clusters also in presence of data contamination. We also present natural generalizations of the approach as well as an adaptive procedure to estimate the amount of contamination in a data-driven fashion. Our proposal outperforms stateof-the-art robust, model-based methods in our numerical simulations and real-world applications related to color quantization for image analysis, human mobility patterns based on GPS data, biomedical images of diabetic retinopathy, and functional data across weather stations.

show abstract

“…The authors in [ 42 ] introduced DEMP-k (Directly Estimated Misclassification Probabilities), which is a combination of the HoSC-K-means (Homoscedastic Spherical Components) and hierarchical linkage functions, thereby increasing the speed and performance of the algorithm. Their work proposed a framework for hierarchical merging based on pairwise overlap between components, this was further applied to the K-means algorithm.…”

Section: Related Work On Computer Vision and Image Clusteringmentioning

confidence: 99%

An Instance Segmentation and Clustering Model for Energy Audit Assessments in Built Environments: A Multi-Stage Approach

Arjoune

Peri

Sugunaraj

et al. 2021

Sensors

View full text Add to dashboard Cite

Heat loss quantification (HLQ) is an essential step in improving a building’s thermal performance and optimizing its energy usage. While this problem is well-studied in the literature, most of the existing studies are either qualitative or minimally driven quantitative studies that rely on localized building envelope points and are, thus, not suitable for automated solutions in energy audit applications. This research work is an attempt to fill this gap of knowledge by utilizing intensive thermal data (on the order of 100,000 plus images) and constitutes a relatively new area of analysis in energy audit applications. Specifically, we demonstrate a novel process using deep-learning methods to segment more than 100,000 thermal images collected from an unmanned aerial system (UAS). To quantify the heat loss for a building envelope, multiple stages of computations need to be performed: object detection (using Mask-RCNN/Faster R-CNN), estimating the surface temperature (using two clustering methods), and finally calculating the overall heat transfer coefficient (e.g., the U-value). The proposed model was applied to eleven academic campuses across the state of North Dakota. The preliminary findings indicate that Mask R-CNN outperformed other instance segmentation models with an mIOU of 73% for facades, 55% for windows, 67% for roofs, 24% for doors, and 11% for HVACs. Two clustering methods, namely K-means and threshold-based clustering (TBC), were deployed to estimate surface temperatures with TBC providing consistent estimates across all times of the day over K-means. Our analysis demonstrated that thermal efficiency not only depended on the accurate acquisition of thermal images but also relied on other factors, such as the building geometry and seasonal weather parameters, such as the outside/inside building temperatures, wind, time of day, and indoor heating/cooling conditions. Finally, the resultant U-values of various building envelopes were compared with recommendations from the American Society of Heating, Refrigerating, and Air-conditioning Engineers (ASHRAE) building standards.

show abstract

Clustering Large Datasets by Merging K-Means Solutions

Cited by 15 publications

References 26 publications

Cross-Shore Profile Evolution after an Extreme Erosion Event—Palanga, Lithuania

Cross-Shore Profile Evolution after an Extreme Erosion Event—Palanga, Lithuania

Tk-merge: Computationally Efficient Robust Clustering Under General Assumptions

An Instance Segmentation and Clustering Model for Energy Audit Assessments in Built Environments: A Multi-Stage Approach

Contact Info

Product

Resources

About