Big Data Analysis with Apache Spark

Singh, Pallavi; Anand, Saurabh; Sagar, B. M.

doi:10.5120/ijca2017915251

Cited by 2 publications

(2 citation statements)

References 3 publications

(3 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…SequentialDeepLearningwasusedasaclassifier(processingengine)intheSparkcluster,whichserves asaBigDataanalyticsframework(Figure3).DADEMmodelusesApacheSparkasframeworkfor implementingBigDataanalyticsthroughadistributedcomputingcluster.Sparkisanopen-sourcebig datamanagementframeworkthatisbuiltaroundeaseofuse,speedandhighdegreeanalytics.Spark givesawide-ranging,collectiveframeworkforBigDatamanagementandprocessingrequirementsfor avarietyofdatasets.Itmakesuseofmemoryandcanexploitdiskforprocessingdata.Sparkemploys theconceptofMapReduceandefficientlyusesvariedcomputationsincludinginteractivequeries andstreamprocessing (Singh,Anand,&B.,2017) TheIoTenvironmentneedsdistributedprocessingtrafficanalysisthatisbasedontechniquesfor BigDataanalyticslikeSpark.Sinceourdistributedenvironmentcontainssomeworkersandmasters inacluster,communicationbetweenworkerandmasterineachclustermustbescarcewithlow variability in order to maintain a good performance for the proposed system. It is proposed that communicationandcoordinationofworkamongworkernodesbebasedonan'elasticforce'that linksparameterscomputedbytheworkernodestoacentervariable (Zhang,Choromanska,&Lecun, 2015).Themasternode(parameterserver)storesthecentervariableandthisactionisperformed bytheAEASGDmethod.Withthisinplace,theproposedsystemwillbeabletoworkasanonline attackdetectionsystem.Experimentsweredoneforthetwodatasetsbasedon12workernodesin Sparkcluster.TheimplementationwasdonethroughusingDatabrickscommunitycloudwhichsaves 15.3GBmemoryandtwocoresandSpark2.4.5versionwithPython.Databrickscommunitycloud savesacompatiblesparkclusterswithKeraslibrarieswhichenabledustoimplementtheproposed model.Theimplementationwasdonethroughusingsparkclusterwhichcontains12workernodes.…”

Section: Deep Learning With Spark Modelmentioning

confidence: 99%

Dadem

Ahmed

Nasr

Abdel-Mageid

et al. 2021

International Journal of Ambient Computing and Intelligence

View full text Add to dashboard Cite

Nowadays, Internet of Things (IoT) is considered as part our lives and it includes different aspects - from wearable devices to smart devices used in military applications. IoT connects a variety of devices and as such, the generated data is considered as ‘Big Data'. There has however been an increase in attacks in this era of IoT since IoT carries crucial information regarding banking, environmental, geographical, medical, and other aspects of the daily lives of humans. In this paper, a Distributed Attack Detection Model (DADEM) that combines two techniques - Deep Learning and Big Data analytics - is proposed. Sequential Deep Learning model is chosen as a classification engine for the distributed processing model after testing its classification accuracy against other classification algorithms like logistic regression, KNN, ID3 decision tree, CART, and SVM. Results showed that Sequential Deep Learning model outperforms the aforementioned ones. The classification accuracy of DADEM approaches 99.64% and 99.98% for the UNSW-NB15 and BoT-IoT datasets, respectively. Moreover, a plan is proposed for optimizing the proposed model to reduce the overhead of the overall system operation in a constrained environment like IoT.

show abstract

Section: Deep Learning With Spark Modelmentioning

confidence: 99%

Dadem

Ahmed

Nasr

Abdel-Mageid

et al. 2021

International Journal of Ambient Computing and Intelligence

View full text Add to dashboard Cite

show abstract

“…Purchase transaction data from subscription commerce businesses is usually of large scale, since that kind of businesses are often acquiring more and more detailed data per customer over time, building a continuously growing profile. Thus, processes like conditional filtering on big data structures is a costly computation process that involves a large amount of data [44]. To address this challenge we applied parallel data processing using the Apache Spark framework [45].…”

Section: B Algorithm Design and Deploymentmentioning

confidence: 99%

Designing a Real-Time Data-Driven Customer Churn Risk Indicator for Subscription Commerce

Deligiannis¹,

Argyriou²

2020

IJIEEB

View full text Add to dashboard Cite

One of the main goals of customer relationship management is to reduce or eliminate "customer churn", i.e. loss of existing customers. This paper introduces a prototype algorithm to estimate a continuously updated indicator of the probability of an existing customer to cease purchasing from a subscription commerce business. The investigation is focused on the case of repeat consumers of subscription commerce products which require regular replacement or replenishment. The motivation is to help marketers to make targeted proactive retention actions by categorizing regular customers into groups of similar estimated churn risk. The proposed algorithm re-computes the probability of churn for each customer at regular intervals using past purchase transaction data and incorporating subscription-based business logic. We describe the detailed process from data collection and feature engineering to the algorithm's design. We also present evaluation results of the algorithm's performance based on a pilot test that took place on a consumables ecommerce business. The results suggest a significant capability of the proposed algorithm in capturing the purchasing intentions of repeat customers, regardless of the risk group they belong to.

show abstract

Big Data Analysis with Apache Spark

Cited by 2 publications

References 3 publications

Dadem

Dadem

Designing a Real-Time Data-Driven Customer Churn Risk Indicator for Subscription Commerce

Contact Info

Product

Resources

About