2022
DOI: 10.11591/eei.v11i3.3187
|View full text |Cite
|
Sign up to set email alerts
|

Distributed big data analysis using spark parallel data processing

Abstract: Nowadays, the big data marketplace is rising rapidly. The big challenge is finding a system that can store and handle a huge size of data and then processing that huge data for mining the hidden knowledge. This paper proposed a comprehensive system that is used for improving big data analysis performance. It contains a fast big data processing engine using Apache Spark and a big data storage environment using Apache Hadoop. The system tests about 11 Gigabytes of text data which are collected from multiple sour… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 19 publications
0
4
0
Order By: Relevance
“…Every node receives a tiny batch of data, which is then used to compute its gradient and send it back to the central node. Distributed training uses synchronous and asynchronous [29], [30].…”
Section: B Data Parallel Modelmentioning
confidence: 99%
“…Every node receives a tiny batch of data, which is then used to compute its gradient and send it back to the central node. Distributed training uses synchronous and asynchronous [29], [30].…”
Section: B Data Parallel Modelmentioning
confidence: 99%
“…MapReduce is a programming methodology created by Google to handle large-scale data analysis. It is based on the Hadoop framework [11], [58], [78]- [81], [64]- [67], [69]- [71], [74]. It is employed in a wide variety of applications.…”
Section: Mapreducementioning
confidence: 99%
“…Also, the characteristics of Spark are appropriate from the bottom-up for treating big data and it is much faster than other big data tools such as Hadoop. Besides, it supports many programming languages such as Java, Scala, Python, and R [19]. Fortunately, the Spark machine learning library consists of an implementation of the ALS algorithm for building a model in the form of collaborative filtering [20].…”
Section: Alternating Least Squares With Sparkmentioning
confidence: 99%