2018
DOI: 10.3390/sym10080342
|View full text |Cite
|
Sign up to set email alerts
|

A Robust Distributed Big Data Clustering-based on Adaptive Density Partitioning using Apache Spark

Abstract: Unsupervised machine learning and knowledge discovery from large-scale datasets have recently attracted a lot of research interest. The present paper proposes a distributed big data clustering approach-based on adaptive density estimation. The proposed method is developed-based on Apache Spark framework and tested on some of the prevalent datasets. In the first step of this algorithm, the input data is divided into partitions using a Bayesian type of Locality Sensitive Hashing (LSH). Partitioning makes the pro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 13 publications
(6 citation statements)
references
References 57 publications
0
6
0
Order By: Relevance
“…The outliers are filtered out by locality preservation, which makes this approach robust. The clusters are made very much homogenous via density definition on Ordered Weighted Averaging distance [72]. A scalable distributed density-based clustering for performing multi-regression tasks is proposed in [77].…”
Section: C3 Machine Learning Based Methodsmentioning
confidence: 99%
“…The outliers are filtered out by locality preservation, which makes this approach robust. The clusters are made very much homogenous via density definition on Ordered Weighted Averaging distance [72]. A scalable distributed density-based clustering for performing multi-regression tasks is proposed in [77].…”
Section: C3 Machine Learning Based Methodsmentioning
confidence: 99%
“…Behrooz Hosseini, et al (2018). [32] proposed a solution built and tested using the Apache Spark framework with a range of datasets.…”
Section: Recent Advancementsmentioning
confidence: 99%
“…Apache Spark is a data-intensive application framework that is designed to process big data and can be executed on commodity clusters [17]. The main difference between the Spark framework and the competition such as MapReduce is that it loads only the useful dataset into the memory, which enables iterative jobs to run queries on big datasets.…”
Section: Spark Platformmentioning
confidence: 99%