A Robust Distributed Big Data Clustering-based on Adaptive Density Partitioning using Apache Spark

Hosseini, Behrooz Koohmareh; Kiani, Kourosh

doi:10.3390/sym10080342

Cited by 13 publications

(6 citation statements)

References 57 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The outliers are filtered out by locality preservation, which makes this approach robust. The clusters are made very much homogenous via density definition on Ordered Weighted Averaging distance [72]. A scalable distributed density-based clustering for performing multi-regression tasks is proposed in [77].…”

Section: C3 Machine Learning Based Methodsmentioning

confidence: 99%

Peer Review #3 of "Big data clustering techniques based on Spark: a literature review (v0.1)"

2020

View full text Add to dashboard Cite

Section: C3 Machine Learning Based Methodsmentioning

confidence: 99%

Peer Review #3 of "Big data clustering techniques based on Spark: a literature review (v0.1)"

2020

View full text Add to dashboard Cite

“…Behrooz Hosseini, et al (2018). [32] proposed a solution built and tested using the Apache Spark framework with a range of datasets.…”

Section: Recent Advancementsmentioning

confidence: 99%

Big Data Clustering Techniques Challenged and Perspectives: Review

Awad¹,

Hamad²

2023

IJCAI

View full text Add to dashboard Cite

Clustering in big data is considered a critical data mining and analysis technique. There are issues with adapting clustering algorithms to large amounts of data and new challenges brought by big data. As the size of big data is up to petabytes of data, and clustering methods have high processing costs, the challenge is how to handle this issue and utilize clustering techniques for big data efficiently. This study aims to investigate the recent advancement of clustering platforms and techniques to handle big data issues, from the early suggested techniques to today's novel solutions. The methodology and specific issues for building an effective clustering mechanism are presented and evaluated, followed by a discussion of the choices for enhancing clustering algorithms. A brief literature review of the recent advancement in clustering techniques has been presented to address each solution's main characteristics and drawbacks.Povzetek: Članek predstavlja pregled tehnik gručenja za velike podatke.

show abstract

“…Apache Spark is a data-intensive application framework that is designed to process big data and can be executed on commodity clusters [17]. The main difference between the Spark framework and the competition such as MapReduce is that it loads only the useful dataset into the memory, which enables iterative jobs to run queries on big datasets.…”

Section: Spark Platformmentioning

confidence: 99%

Improved k-Means Clustering Algorithm for Big Data Based on Distributed SmartphoneNeural Engine Processor

Awad

Hamad

2022

Electronics

View full text Add to dashboard Cite

Clustering is one of the most significant applications in the big data field. However, using the clustering technique with big data requires an ample amount of processing power and resources due to the complexity and resulting increment in the clustering time. Therefore, many techniques have been implemented to improve the performance of the clustering algorithms, especially for k-means clustering. In this paper, the neural-processor-based k-means clustering technique is proposed to cluster big data by accumulating the advantage of dedicated machine learning processors of mobile devices. The solution was designed to be run with a single-instruction machine processor that exists in the mobile device’s processor. Running the k-means clustering in a distributed scheme run based on mobile machine learning efficiently can handle the big data clustering over the network. The results showed that using a neural engine processor on a mobile smartphone device can maximize the speed of the clustering algorithm, which shows an improvement in the performance of the cluttering up to two-times faster compared with traditional laptop/desktop processors. Furthermore, the number of iterations that are required to obtain (k) clusters was improved up to two-times faster than parallel and distributed k-means.

show abstract

A Robust Distributed Big Data Clustering-based on Adaptive Density Partitioning using Apache Spark

Cited by 13 publications

References 57 publications

Peer Review #3 of "Big data clustering techniques based on Spark: a literature review (v0.1)"

Peer Review #3 of "Big data clustering techniques based on Spark: a literature review (v0.1)"

Big Data Clustering Techniques Challenged and Perspectives: Review

Improved k-Means Clustering Algorithm for Big Data Based on Distributed SmartphoneNeural Engine Processor

Contact Info

Product

Resources

About