Clustering, a traditional machine learning method, plays a significant role in data analysis. Most clustering algorithms depend on a predetermined exact number of clusters, whereas, in practice, clusters are usually unpredictable. Although the Elbow method is one of the most commonly used methods to discriminate the optimal cluster number, the discriminant of the number of clusters depends on the manual identification of the elbow points on the visualization curve. Thus, experienced analysts cannot clearly identify the elbow point from the plotted curve when the plotted curve is fairly smooth. To solve this problem, a new elbow point discriminant method is proposed to yield a statistical metric that estimates an optimal cluster number when clustering on a dataset. First, the average degree of distortion obtained by the Elbow method is normalized to the range of 0 to 10. Second, the normalized results are used to calculate the cosine of intersection angles between elbow points. Third, this calculated cosine of intersection angles and the arccosine theorem are used to compute the intersection angles between elbow points. Finally, the index of the above-computed minimal intersection angles between elbow points is used as the estimated potential optimal cluster number. The experimental results based on simulated datasets and a well-known public dataset (Iris Dataset) demonstrated that the estimated optimal cluster number obtained by our newly proposed method is better than the widely used Silhouette method.
Clustering, as a traditional machine learning method, is still playing a significant role in data analysis. The most of clustering algorithms depend on a predetermined exact number of clusters, whereas, in practice, clusters are usually unpredictable. Although elbow method is one of the most commonly used methods to discriminate the optimal cluster number, the discriminant of the number of clusters depends on manual identification of the elbow points on the visualization curve, which will lead to the experienced analysts not being able to clearly identify the elbow point from the plotted curve when the plotted curve being fairly smooth. To solve this problem, a new elbow point discriminant method is proposed to work out a statistical metric estimating an optimal cluster number when clustering on a dataset. Firstly, the average degree of distortion obtained by Elbow method is normalized to the range of 0 to10; Secondly, the normalized results are used to calculate Cosine of intersection angles between elbow points; Thirdly, the above calculated Cosine of intersection angles and Arccosine theorem are used to compute the intersection angles between elbow points; Finally, the index of the above computed minimal intersection angles between elbow points is used as the estimated potential optimal cluster number. The experimental results based on simulated datasets and a public well-known dataset (Iris Dataset) demonstrated that the estimated optimal cluster number output by our newly proposed method is better than widely used Silhouette method.
Clustering, as a traditional machine learning method, is still playing a significant role in data analysis. The most of clustering algorithms depend on a predetermined exact number of clusters, whereas, in practice, clusters are usually unpredictable. Although elbow method is one of the most commonly used methods to discriminate the optimal cluster number, the discriminant of the number of clusters depends on manual identification of the elbow points on the visualization curve, which will lead to the experienced analysts not being able to clearly identify the elbow point from the plotted curve when the plotted curve being fairly smooth. To solve this problem, a new elbow point discriminant method is proposed to work out a statistical metric estimating an optimal cluster number when clustering on a dataset. Firstly, the average degree of distortion obtained by Elbow method is normalized to the range of 0 to10; Secondly, the normalized results are used to calculate Cosine of intersection angles between elbow points; Thirdly, the above calculated Cosine of intersection angles and Arccosine theorem are used to compute the intersection angles between elbow points; Finally, the index of the above computed minimal intersection angles between elbow points is used as the estimated potential optimal cluster number. The experimental results based on simulated datasets and a public well-known dataset demonstrated that the estimated optimal cluster number output by our newly proposed method is better than widely used Silhouette method.
The volume of data generated by modern astronomical telescopes is extremely large and rapidly growing. However, current high-performance data processing architectures/frameworks are not well suited for astronomers because of their limitations and programming difficulties. In this paper, we therefore presentOpenCluster, an open-source distributed computing framework to support rapidly developing high-performance processing pipelines of astronomical big data. We first detail the OpenCluster design principles and implementations and present the APIs facilitated by the framework. We then demonstrate a case in which OpenCluster is used to resolve complex data processing problems for developing a pipeline for the Mingantu Ultrawide Spectral Radioheliograph. Finally, we present our OpenCluster performance evaluation. Overall, OpenCluster provides not only high fault tolerance and simple programming interfaces, but also a flexible means of scaling up the number of interacting entities. OpenCluster thereby provides an easily integrated distributed computing framework for quickly developing a high-performance data processing system of astronomical telescopes and for significantly reducing software development expenses.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.