Fuzzy Based Scalable Clustering Algorithms for Handling Big Data Using Apache Spark

Bharill, Neha; Tiwari, Aruna; Malviya, Aayushi

doi:10.1109/tbdata.2016.2622288

Cited by 59 publications

(22 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The space complexity of Fast Kernel Matrix Computation was higher. The Scalable Random Sampling with Iterative Optimization Fuzzy c-Means algorithm (SRSIO-FCM) was introduced in [15] to address the challenges involved during big data clustering. The clustering performance of SRSIO-FCM was not efficient therefore lacks clustering accuracy.…”

Section: Related Workmentioning

confidence: 99%

Moore Data Clustering Based Bloom Hash Storage for Dimensionality Reduction of Big Data Analytics

K*¹,

Maheswari.²

2019

IJRTE

View full text Add to dashboard Cite

Big data contains massive amounts of information’s that are difficult to manage, acquire, store and analyses. The clustering of data is a demanding issue in the field of big data analytics. The existing techniques developed for clustering does not provide efficient performance and also time complexity of clustering was higher. Further, minimizing dimensionality of big data was not addressed effectively. In order to overcome these limitations, a Moore Data Clustering based Bloom Hash Storage (MDC-BHS) Technique is proposed. The MDC-BHS Technique is designed with aim of reducing the dimensionality of big data with lesser time through clustering. The MDC-BHS Technique used Moore Data Clustering (MDC) Model in order to group the data in big dataset with minimum time consumption. After performing clustering process, the MDC-BHS Technique employed Bloom Hash Storage (BHS) Model in order to store clustered data with minimum space complexity. The BHS Model is a space-efficient probabilistic data structure which utilized hashing function to create hash value for clustered data. Therefore, proposed MDC-BHS Technique significantly reduces the dimensionality of larger dataset. The experimental evaluation of MDC-BHS technique is carried out on weather data with factors such as clustering time and clustering accuracy and space complexity with respect to number of data. The experimental results demonstrate that MDC-BHS Technique is able to improve the clustering accuracy and also minimizes the space complexity when compared to state-of-the-art works

show abstract

Section: Related Workmentioning

confidence: 99%

Moore Data Clustering Based Bloom Hash Storage for Dimensionality Reduction of Big Data Analytics

K*¹,

Maheswari.²

2019

IJRTE

View full text Add to dashboard Cite

show abstract

“…K. Peng et al [17] has proposed a clustering strategy for IDS dependent on Mini Batch K-means joined with important part examination. Initial, a pre-processing technique was proposed to digitize the strings and afterward the informational index was standardized in order to improve the clustering proficiency.…”

Section: Related Workmentioning

confidence: 99%

KM-MBFO: A Hybrid Hadoop Map Reduce Access for Clustering Big Data by Adopting Modified Bacterial Foraging Optimization Algorithm

2019

IJRTE

View full text Add to dashboard Cite

K-Means Clustering is a very powerful and frequently used algorithm for the clustering, it has got its own limitation. The prevalent K-Means clustering algorithm used for grouping have inadequacies, for example, slow convergence rate, local optima trap, and so on. Therefore, many swarm knowledge based procedures combined with KM for clustering were presented and demonstrated their presentation, its variations and its applications in data grouping. In this paper we intend to propose a parallel organizing strategy for KM-MBFO mechanism that actualized in Hadoop Distributed File System (HDFS) for diminishing the execution time. This Mapper approach produces the populace for given data set for grouping. The Modified Bacterial Foraging Optimization (MBFO) algorithm finds the wellness of the populace to choose the optimal K values as far as execution time and classification error. Through simulated test results, we assess the demonstration of the proposed KM-BFO conspire

show abstract

“…In‐memory processes enhance the effectiveness of program execution that the performance is superior to offline storage . However, in‐memory processing performance has scope for improvement; for example, the execution time of a certain type of instruction, such as reduceByKey() and groupByKey() in join operations, is longer than for general instructions . The data are sorted among offline storages depending on the time consumed.…”

Section: Introductionmentioning

confidence: 99%

“…The differences between Spark and Hadoop in intermediate data buffer result in high performance of iterative applications and interactive data mining with Spark. 17,25 Dharanipragada et al proposed Generate-Map-Reduce (GMR), which was an extension to MapReduce, to support iterative jobs and a distributed communication model by using shared data structures. GMR captured recursive computations by modeling iterative applications, such as simulated annealing and A* search.…”

mentioning

confidence: 99%

Performance enhancement for iterative data computing with in‐memory concurrent processing

Wen

Chen

Chiu

et al. 2019

Concurrency and Computation

View full text Add to dashboard Cite

Summary The big data era has resulted in the development of several data analysis tools. Spark is a type of in‐memory processing fitted iteration and interactive data mining tool. This tool possesses higher data‐processing performance than MapReduce, which is an offline storage mechanism. However, some disadvantages of in‐memory processing, such as massive in‐memory data requirements, cause cross‐node data transfer that result in a long computation time. The performance of the process can be improved if the in‐memory process is executed with fewer shuffle instructions. Therefore, this study aims to enhance the performance of iterative application through instruction replacement. Three empirical research cases with diverse datasets and iterations are used to modify the program. We adopt a strategy of downloading a small resilient distributed dataset and replacing the shuffle‐included instructions to shorten the processing time with an automated code replacement by using exhaustively code matching. The experimental results reveal an improvement of up to 39% in the execution time compared with the existing in‐memory processing programs with various dataset sizes.

show abstract

Fuzzy Based Scalable Clustering Algorithms for Handling Big Data Using Apache Spark

Cited by 59 publications

References 25 publications

Moore Data Clustering Based Bloom Hash Storage for Dimensionality Reduction of Big Data Analytics

Moore Data Clustering Based Bloom Hash Storage for Dimensionality Reduction of Big Data Analytics

KM-MBFO: A Hybrid Hadoop Map Reduce Access for Clustering Big Data by Adopting Modified Bacterial Foraging Optimization Algorithm

Performance enhancement for iterative data computing with in‐memory concurrent processing

Contact Info

Product

Resources

About