2017
DOI: 10.5120/ijca2017915251
|View full text |Cite
|
Sign up to set email alerts
|

Big Data Analysis with Apache Spark

Abstract: Manipulating big data distributed over a cluster is one of the big challenges which most of the current big data oriented companies face. This is evident by the popularity of MapReduce and Hadoop, and most recently Apache Spark, a fast, in-memory distributed collections framework which caters to provide solution for big data management. This paper, present a discussion on how technically Apache Spark help us in Big Data Analysis and Management.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 3 publications
(3 reference statements)
0
1
0
Order By: Relevance
“…SequentialDeepLearningwasusedasaclassifier(processingengine)intheSparkcluster,whichserves asaBigDataanalyticsframework(Figure3).DADEMmodelusesApacheSparkasframeworkfor implementingBigDataanalyticsthroughadistributedcomputingcluster.Sparkisanopen-sourcebig datamanagementframeworkthatisbuiltaroundeaseofuse,speedandhighdegreeanalytics.Spark givesawide-ranging,collectiveframeworkforBigDatamanagementandprocessingrequirementsfor avarietyofdatasets.Itmakesuseofmemoryandcanexploitdiskforprocessingdata.Sparkemploys theconceptofMapReduceandefficientlyusesvariedcomputationsincludinginteractivequeries andstreamprocessing (Singh,Anand,&B.,2017) TheIoTenvironmentneedsdistributedprocessingtrafficanalysisthatisbasedontechniquesfor BigDataanalyticslikeSpark.Sinceourdistributedenvironmentcontainssomeworkersandmasters inacluster,communicationbetweenworkerandmasterineachclustermustbescarcewithlow variability in order to maintain a good performance for the proposed system. It is proposed that communicationandcoordinationofworkamongworkernodesbebasedonan'elasticforce'that linksparameterscomputedbytheworkernodestoacentervariable (Zhang,Choromanska,&Lecun, 2015).Themasternode(parameterserver)storesthecentervariableandthisactionisperformed bytheAEASGDmethod.Withthisinplace,theproposedsystemwillbeabletoworkasanonline attackdetectionsystem.Experimentsweredoneforthetwodatasetsbasedon12workernodesin Sparkcluster.TheimplementationwasdonethroughusingDatabrickscommunitycloudwhichsaves 15.3GBmemoryandtwocoresandSpark2.4.5versionwithPython.Databrickscommunitycloud savesacompatiblesparkclusterswithKeraslibrarieswhichenabledustoimplementtheproposed model.Theimplementationwasdonethroughusingsparkclusterwhichcontains12workernodes.…”
Section: Deep Learning With Spark Modelmentioning
confidence: 99%
“…SequentialDeepLearningwasusedasaclassifier(processingengine)intheSparkcluster,whichserves asaBigDataanalyticsframework(Figure3).DADEMmodelusesApacheSparkasframeworkfor implementingBigDataanalyticsthroughadistributedcomputingcluster.Sparkisanopen-sourcebig datamanagementframeworkthatisbuiltaroundeaseofuse,speedandhighdegreeanalytics.Spark givesawide-ranging,collectiveframeworkforBigDatamanagementandprocessingrequirementsfor avarietyofdatasets.Itmakesuseofmemoryandcanexploitdiskforprocessingdata.Sparkemploys theconceptofMapReduceandefficientlyusesvariedcomputationsincludinginteractivequeries andstreamprocessing (Singh,Anand,&B.,2017) TheIoTenvironmentneedsdistributedprocessingtrafficanalysisthatisbasedontechniquesfor BigDataanalyticslikeSpark.Sinceourdistributedenvironmentcontainssomeworkersandmasters inacluster,communicationbetweenworkerandmasterineachclustermustbescarcewithlow variability in order to maintain a good performance for the proposed system. It is proposed that communicationandcoordinationofworkamongworkernodesbebasedonan'elasticforce'that linksparameterscomputedbytheworkernodestoacentervariable (Zhang,Choromanska,&Lecun, 2015).Themasternode(parameterserver)storesthecentervariableandthisactionisperformed bytheAEASGDmethod.Withthisinplace,theproposedsystemwillbeabletoworkasanonline attackdetectionsystem.Experimentsweredoneforthetwodatasetsbasedon12workernodesin Sparkcluster.TheimplementationwasdonethroughusingDatabrickscommunitycloudwhichsaves 15.3GBmemoryandtwocoresandSpark2.4.5versionwithPython.Databrickscommunitycloud savesacompatiblesparkclusterswithKeraslibrarieswhichenabledustoimplementtheproposed model.Theimplementationwasdonethroughusingsparkclusterwhichcontains12workernodes.…”
Section: Deep Learning With Spark Modelmentioning
confidence: 99%
“…Purchase transaction data from subscription commerce businesses is usually of large scale, since that kind of businesses are often acquiring more and more detailed data per customer over time, building a continuously growing profile. Thus, processes like conditional filtering on big data structures is a costly computation process that involves a large amount of data [44]. To address this challenge we applied parallel data processing using the Apache Spark framework [45].…”
Section: B Algorithm Design and Deploymentmentioning
confidence: 99%