2016
DOI: 10.1007/s41060-016-0027-9
|View full text |Cite
|
Sign up to set email alerts
|

Big data analytics on Apache Spark

Abstract: Apache Spark has emerged as the de facto framework for big data analytics with its advanced in-memory programming model and upper-level libraries for scalable machine learning, graph analysis, streaming and structured data processing. It is a general-purpose cluster computing framework with language-integrated APIs in Scala, Java, Python and R. As a rapidly evolving open source project, with an increasing number of contributors from both academia and industry, it is difficult for researchers to comprehend the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
121
0
4

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 319 publications
(152 citation statements)
references
References 58 publications
0
121
0
4
Order By: Relevance
“…In other words, new batches are created from input DStreams depending on the batch interval length and those discrete streams are stored in memory as RDD sequences. The RDDs are then executed by generating Spark jobs . Figure shows the architectural overview of the Spark Streaming scheme.…”
Section: Real‐time Sentiment Prediction Frameworkmentioning
confidence: 99%
“…In other words, new batches are created from input DStreams depending on the batch interval length and those discrete streams are stored in memory as RDD sequences. The RDDs are then executed by generating Spark jobs . Figure shows the architectural overview of the Spark Streaming scheme.…”
Section: Real‐time Sentiment Prediction Frameworkmentioning
confidence: 99%
“…Spark is an open source framework for distributed computing [17]. It is a set of tools and software components structured according to a defined architecture.…”
Section: ) Only Suitable For Processing Data On Batch 2) No Real Timmentioning
confidence: 99%
“…This is because a MapReduce jobs need I/O disk operations to shuffle and sort the data during the Map and Reduce phases. Furthermore, Apache Spark provides rich APIs in several languages (Java, Scala, Python, and R) for developers to choose from in order to perform complex operations on distributed RDDs …”
Section: Introductionmentioning
confidence: 99%
“…During the running of the spark application, the driver program monitors the executors and sends the tasks to the executors to run in multi‐thread mode. The spark application keeps running until the spark context's stop method is invoked or the main function of the application is finished …”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation