Wiley StatsRef: Statistics Reference Online 2018
DOI: 10.1002/9781118445112.stat07978
|View full text |Cite
|
Sign up to set email alerts
|

Big Data Clustering

Abstract: Clustering algorithms group data items based on clearly defined similarity between the items aiming to minimize the intracluster differences and maximize the intercluster distances. A wealth of efficient and good quality clustering algorithms are already available for traditional data, but there are challenges for applying them to big data due to the overwhelming volume and complexities of such data. Data volume is getting bigger at an incredible pace due to growing access to Internet, social media, mobile dev… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
9
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(9 citation statements)
references
References 30 publications
0
9
0
Order By: Relevance
“…Multiple machine‐based clustering environments distribute the workload across multiple servers and commodity machines by parallel execution such as map‐reduced or peer‐to‐peer network variations. A single machine environment uses single server resources through batch processing and enables the addition of additional resources as necessary 2,23,24 …”
Section: Introductionmentioning
confidence: 99%
“…Multiple machine‐based clustering environments distribute the workload across multiple servers and commodity machines by parallel execution such as map‐reduced or peer‐to‐peer network variations. A single machine environment uses single server resources through batch processing and enables the addition of additional resources as necessary 2,23,24 …”
Section: Introductionmentioning
confidence: 99%
“…Classical clustering algorithms are facing various challenges due to data volume, variety, and velocity. The data volume is defining the computational cost, speed, efficiency, and scalability challenges of the classical clustering algorithms (Khondoker, 2018;Maheswari & Ramakrishnan, 2019). Big data clustering focuses on scale-up, speed-up, optimizing computation costs, and resources without the effect of cluster quality.…”
Section: Introductionmentioning
confidence: 99%
“…Big data clustering focuses on scale-up, speed-up, optimizing computation costs, and resources without the effect of cluster quality. The design of the big data clustering is dependent upon the single-machine and multiple-machine execution environment (Khondoker, 2018).…”
Section: Introductionmentioning
confidence: 99%
“…Data mining algorithms are required to improve upon their computational cost, speed, scalability, flexibility, and efficiency according to the essential characteristics of big data 7 . Extraction of appropriate hidden predictive information, patterns and relations from the heterogeneous large‐scale dataset is big data mining, 8 which requires higher transparency for volume, variety and velocity because large‐scale data contains valuable knowledge and information 9 .…”
Section: Introductionmentioning
confidence: 99%
“…Contributions in the past 7,15,16,19,20 addressed cluster creation techniques under single and multiple machine execution environments of big data mining. Clustering techniques for big data mining are categorized into divide‐and‐conquer, parallel, center reduction, efficient nearest neighbor (NN) search, sampling, dimension reduction, incremental, and condensation methods.…”
Section: Introductionmentioning
confidence: 99%