2020
DOI: 10.7717/peerj-cs.321
|View full text |Cite
|
Sign up to set email alerts
|

Big data clustering techniques based on Spark: a literature review

Abstract: A popular unsupervised learning method, known as clustering, is extensively used in data mining, machine learning and pattern recognition. The procedure involves grouping of single and distinct points in a group in such a way that they are either similar to each other or dissimilar to points of other clusters. Traditional clustering methods are greatly challenged by the recent massive growth of data. Therefore, several research works proposed novel designs for clustering methods that leverage the benefits of B… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(12 citation statements)
references
References 61 publications
0
9
0
Order By: Relevance
“…The main disadvantages of unsupervised learning are unable to provide accurate information concerning data sorting and computationally complex. One of the most popular unsupervised learning approaches is clustering [ 54 ].…”
Section: Classification Of DL Approachesmentioning
confidence: 99%
“…The main disadvantages of unsupervised learning are unable to provide accurate information concerning data sorting and computationally complex. One of the most popular unsupervised learning approaches is clustering [ 54 ].…”
Section: Classification Of DL Approachesmentioning
confidence: 99%
“…We investigated the characteristic angiographic pattern of cerebral vasospasm by visual classification accompanied by a mathematical clustering algorithm at ten evaluated reference points. Clustering is one of the most common unsupervised machine learning tasks [ 15 ] allowing objective classification of vessel diameters without predefined thresholds. The distribution of mathematically clustered vessel diameters matched the criteria of the visual classification in many points.…”
Section: Discussionmentioning
confidence: 99%
“…The former indicates the similarity between objects in the same cluster, and the latter implies the difference of objects between different clusters. The purpose of clustering is to maximize the homogeneity of the same cluster and the heterogeneity of different clusters [10]. Driven by these two concepts, various types of clustering methods have been introduced.…”
Section: Literature Reviewmentioning
confidence: 99%
“…The ability to balance exploration and exploitation is the concern of all metaheuristic algorithms [63]. The analysis of EHO reveals that the worst positioned agents are only randomly modified by (10). This kind of approach lacks some variation mechanism, which makes the exploitation capacity insufficient and thus leads to a slow convergence.…”
Section: Motivationmentioning
confidence: 99%