Spark: A Big Data Processing Platform Based on Memory Computing

Han, Zhijie; Zhou, Yujie

doi:10.1109/paap.2015.41

Cited by 45 publications

(23 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…• Programmable Clusters condition has brought a few difficulties: Firstly, numerous applications should be modified in a parallel way, and the programmable Clusters need to process more sorts of information figuring; Secondly, the adaptation to internal failure of the Clusters is progressively significant and troublesome; Thirdly, Clusters powerfully arrange the registering assets between shared clients, which builds the obstruction of the applications. With the quick increment of utilizations, Clusters figuring requires a working answer for suit various computations [18]. • Common difficulties during information change and highlight extraction include: Taking absolute information, (for example, nation for geolocation or classification for a motion picture) and encoding it in a numerical portrayal.…”

Section: A Challenges In Existing Methodologiesmentioning

confidence: 99%

The Deep Learning and Apache Spark Enabled Architecture for Improving the Performance of Big Data Classification

Brahmane¹,

Krishna²

2019

IJITEE

View full text Add to dashboard Cite

At present the Big Data applications, for example, informal communication, therapeutic human services, horticulture, banking, financial exchange, instruction, Facebook and so forth are producing the information with extremely rapid. Volume and Velocity of the Big information assumes a significant job in the presentation of Big information applications. Execution of the Big information application can be influenced by different parameters. Expediently search, proficiency and precision are the a portion of the overwhelming parameters which influence the general execution of any Big information applications. Due the immediate and aberrant inclusion of the qualities of 7Vs of Big information, each Big Data administrations anticipate the elite. Elite is the greatest test in the present evolving situation. In this paper we propose the Big Data characterization way to deal with speedup the Big Data applications. This paper is the review paper, we allude different Big information advancements and the related work in the field of Big Data Classification. In the wake of learning and understanding the writing we discover the holes in existing work and techniques. Finally we propose the novel methodology of Big Data characterization. Our methodology relies on the Deep Learning and Apache Spark engineering. In the proposed work two stages are appeared; first stage is include choice and second stage is Big Data Classification. Apache Spark is the most reasonable and predominant innovation to execute this proposed work. Apache Spark is having two hubs; introductory hubs and last hubs. The element choice will be occur in introductory hubs and Big Data Classification will happen in definite hubs of Apache Spark.

show abstract

Section: A Challenges In Existing Methodologiesmentioning

confidence: 99%

The Deep Learning and Apache Spark Enabled Architecture for Improving the Performance of Big Data Classification

Brahmane¹,

Krishna²

2019

IJITEE

View full text Add to dashboard Cite

show abstract

“…Saving input-output and middle data in In-memory as a form of RDD(Resilient Distributed Dataset) facilitates more rapid processing speed because it could show high performance and rapid processing of conversational work road without additional cost or repetition of I/O. [2] Above figure shows a structure of Stack. There are standalone, Scheduler, YARN and Mesos for operating Spark in infraclass.…”

Section: A Apache Flumementioning

confidence: 99%

Motion Prediction in Rock Scissor Paper Game based on Machine Learning

Ryeol¹,

Kim²,

Yeol³

et al. 2017

Sixth International Conference on Advances in Computing, Control and Networking - ACCN 2017

View full text Add to dashboard Cite

Abstract-The main aim of this system is to predict and analyze gesture pattern from a user based on machine learning. The system is adopted in the rock-paper-scissors game which is suggested as always win at the game by predicting user gesture. Quantization and processing of user gesture from EMG sensor are implemented to generate training data in order that disciplining machine. As suggested procedure, an enormous amount of user gesture data will be collected and training model will be implemented with machine learning. By adopting the implemented system into the game, the research will verify that it is feasible to predict user gesture during playing game. The manner of a game is that computer shows the result when the user starts the rock-paper-scissors game in front of the monitor and the system always shows winning result that is the main purpose of it.

show abstract

“…Spark is a new generation of distributed processing framework for big data following Hadoop. It has been rapidly pursued by academia and industry with its advanced design concept.…”

Section: Introductionmentioning

confidence: 99%

“…It has been rapidly pursued by academia and industry with its advanced design concept. It not only efficiently processes a large amount of data from different applications and data sources but also greatly reduces the number of disk I/Os by caching intermediate data of applications in memory and using a more powerful and flexible task scheduling mechanism based on directed acyclic graph (DAG) . Because Spark implements the DAG execution engine, which can efficiently process data streams based on memory, it is 100 times faster in terms of memory‐based operations and 10 times faster in hard disk‐based operations than Hadoop Mapreduce according to the official test results…”

Section: Introductionmentioning

confidence: 99%

Handling data skew at reduce stage in Spark by ReducePartition

Guo

Huang

Tian

2019

Concurrency and Computation

View full text Add to dashboard Cite

Summary As a typical representative of distributed computing framework, Spark has been continuously developed and popularized. It reduces the data transmission time through efficient memory‐based operations and solves the shortcomings of the traditional MapReduce computation model in iterative computation. In Spark, data skew is very prominent due to the uneven distribution of input data and the unbalanced allocation of default partitioning algorithm. When data skew occurs, the execution efficiency of the program will be reduced, especially in the reduce stage of Spark. Therefore, this paper proposes ReducePartition to solve data skew problem at reduce stage of Spark platform. First, the compute node samples the local data according to the sampling algorithm to predict the overall characteristics of data distribution. Then, to take full use of cluster resources, ReducePartition divides data into multiple partitions evenly. Next, taking into account the differences in computational capabilities among Executors, each task is assigned to Executor with the highest performance factor according to the greedy strategy. Finally, the results of the related algorithms and ReducePartition are compared by using WordCount benchmark and Sort benchmarks on heterogeneous Spark standalone cluster. The performance of the ReducePartition under different degree of data skew and different data size is analyzed. Experimental results show that the proposed algorithm can effectively reduce the impact of data skew on the total makespan of Spark big data applications, and the average total makespan is reduced by 30% to 50% while resource utilization is increased by 20%‐30% on average.

show abstract

Spark: A Big Data Processing Platform Based on Memory Computing

Cited by 45 publications

References 13 publications

The Deep Learning and Apache Spark Enabled Architecture for Improving the Performance of Big Data Classification

The Deep Learning and Apache Spark Enabled Architecture for Improving the Performance of Big Data Classification

Motion Prediction in Rock Scissor Paper Game based on Machine Learning

Handling data skew at reduce stage in Spark by ReducePartition

Contact Info

Product

Resources

About