Clustering techniques are often applied in data analytics for interpreting the similarities within data objects over large datasets. Despite the existence of many clustering algorithm in the literature such as connectivity, centroid, distribution, density etc, the factor that constitutes a cluster are different from one another. However, the success of clustering depends upon the maximization of intra-cluster similarity and inter-cluster dissimilarity. The significant implication of clustering algorithms in many real-world applications emerges the proposal of newer algorithms. As a consequence, in this paper, a novel effort is made to generate clusters from a different aspect of grouping data objects using multidimensional geographical linear equation called Equilin Clustering. The technique incorporates the standard linear equation and the method of percentage split for clustering numerical data. The results show that the performance of Equilin Clustering yields better cluster results with reduced complexity over time and number of iterations.
Outlier detection is a part of data analytics that helps users to find discrepancies in working machines by applying outlier detection algorithm on the captured data for every fixed interval. An outlier is a data point that exhibits different properties from other points due to some external or internal forces. These outliers can be detected by clustering the data points. To detect outliers, optimal clustering of data points is important. The problem that arises quite frequently in statistics is identification of groups or clusters of data within a population or sample. The most widely used procedure to identify clusters in a set of observations is k-means using Euclidean distance. Euclidean distance is not so efficient for finding anomaly in multivariate space. This chapter uses k-means algorithm with Mahalanobis distance metric to capture the variance structure of the clusters followed by the application of extreme value analysis (EVA) algorithm to detect the outliers for detecting rare items, events, or observations that raise suspicions from the majority of the data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.