2021
DOI: 10.1109/access.2021.3084057
|View full text |Cite
|
Sign up to set email alerts
|

Scalable Clustering Algorithms for Big Data: A Review

Abstract: Clustering algorithms have become one of the most critical research areas in multiple domains, especially data mining. However, with the massive growth of big data applications in the cloud world, these applications face many challenges and difficulties. Since Big Data refers to an enormous amount of data, most traditional clustering algorithms come with high computational costs. Hence, the research question is how to handle this volume of data and get accurate results at a critical time. Despite ongoing resea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 35 publications
(20 citation statements)
references
References 112 publications
0
16
0
Order By: Relevance
“…In the literature, there are two main types of clustering algorithms that are often combined with IVIs to determine the ONC: hierarchical and partitional clustering [8]. For the hierarchical clustering, according to the direction of clustering, there are two types of methods: agglomerative hierarchical clustering (AHC) and divisive hierarchical clustering (DHC) [21], where the former follows the bottom-top strategy, which treats each sample as a complete cluster at the beginning, and then gradually merges them into some larger cluster based on a certain criterion, and on the contrary, the DHC adopts the top-down strategy, which initially regards the entire dataset as a complete cluster and then splits the dataset into some smaller clusters based on a certain criterion [22]. Compared with the DHC, the AHC is more accurate and widely used [23], and in recent years, the classic AHC with single linkage, complete linkage, average linkage and ward linkage are still the most widely used AHC methods [24,25].…”
Section: Clustering Algorithmsmentioning
confidence: 99%
“…In the literature, there are two main types of clustering algorithms that are often combined with IVIs to determine the ONC: hierarchical and partitional clustering [8]. For the hierarchical clustering, according to the direction of clustering, there are two types of methods: agglomerative hierarchical clustering (AHC) and divisive hierarchical clustering (DHC) [21], where the former follows the bottom-top strategy, which treats each sample as a complete cluster at the beginning, and then gradually merges them into some larger cluster based on a certain criterion, and on the contrary, the DHC adopts the top-down strategy, which initially regards the entire dataset as a complete cluster and then splits the dataset into some smaller clusters based on a certain criterion [22]. Compared with the DHC, the AHC is more accurate and widely used [23], and in recent years, the classic AHC with single linkage, complete linkage, average linkage and ward linkage are still the most widely used AHC methods [24,25].…”
Section: Clustering Algorithmsmentioning
confidence: 99%
“…This is the size of input data. The size of data was presented as a factor that affects the selection of clustering algorithm by Andreopoulos et al ( 2009 ), Shirkhorshidi et al ( 2014 ) and more recently Mahdi et al ( 2021 ). They observed that some clustering algorithms perform poorly and sacrifice quality when the size of data increases in volume, velocity, variability and variety.…”
Section: Components and Classifications For Data Clusteringmentioning
confidence: 99%
“…Aggarwal et al ( 2003 ) described data stream as large volumes of data arriving at an unlimited growth rate. As noted by Mahdi et al ( 2021 ) data types that are vast and complex to store such as social network data (referred to as big data) and high-speed data (data stream) such as web-click streams, network traffic could be challenging to cluster. In addition, they emphasized that the type of data type considered often influences the type of clustering techniques selected.…”
Section: Data Size Dimensionality and Data Type Issues In Clusteringmentioning
confidence: 99%
“…Clustering analysis is an important part of data mining and the basis of some data mining methods [3]. Its application scenarios are very wide, in computer science [4], [5], [6], biological [7], [8], chemistry [9], society [10] and…”
Section: Introductionmentioning
confidence: 99%