Efficient big data processing in Hadoop MapReduce

Dittrich, Jens; Quiané-Ruiz, Jorge-Arnulfo

doi:10.14778/2367502.2367562

Cited by 220 publications

(100 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The software platforms for smart cities should offer high performance computing capabilities, be optimized for the hardware being used, is stable and reliable for the different data-intensive applications being executed, supports stream processing, provides a high-levels of fault resilience, and is supported by a well-trained and capable team and vendor. There are different available software platforms for big data analytics such as Hadoop Mapreduce [28], HPCC [29], Stratosphere [30], and IBM Infosphere Streams [31], which provide the stream processing required by real-time big data applications such as intelligent transportations in a smart city [19]. These platforms work well on cluster systems that can provide a powerful and scalable hardware platform to meet the requirements of big data applications for smart cities.…”

Section: Big Data Managementmentioning

confidence: 99%

Applications of big data to smart cities

Nuaimi¹,

Neyadi²,

Mohamed³

et al. 2015

J Internet Serv Appl

702

407

View full text Add to dashboard Cite

Many governments are considering adopting the smart city concept in their cities and implementing big data applications that support smart city components to reach the required level of sustainability and improve the living standards. Smart cities utilize multiple technologies to improve the performance of health, transportation, energy, education, and water services leading to higher levels of comfort of their citizens. This involves reducing costs and resource consumption in addition to more effectively and actively engaging with their citizens. One of the recent technologies that has a huge potential to enhance smart city services is big data analytics. As digitization has become an integral part of everyday life, data collection has resulted in the accumulation of huge amounts of data that can be used in various beneficial application domains. Effective analysis and utilization of big data is a key factor for success in many business and service domains, including the smart city domain. This paper reviews the applications of big data to support smart cities. It discusses and compares different definitions of the smart city and big data and explores the opportunities, challenges and benefits of incorporating big data applications for smart cities. In addition it attempts to identify the requirements that support the implementation of big data applications for smart city services. The review reveals that several opportunities are available for utilizing big data in smart cities; however, there are still many issues and challenges to be addressed to achieve better utilization of this technology.

show abstract

Section: Big Data Managementmentioning

confidence: 99%

Applications of big data to smart cities

Nuaimi¹,

Neyadi²,

Mohamed³

et al. 2015

J Internet Serv Appl

702

407

View full text Add to dashboard Cite

show abstract

“…Paper [50] puts light on main issues and challenges of big data processing over MapReduce, by highlighting actual data management solutions that found over this computational platform. Several aspects are touched, including job optimization, physical data organization, data layouts, indexes, and so forth.…”

Section: Mapreduce Algorithms For Big Data Processingmentioning

confidence: 99%

An Effective and Efficient MapReduce Algorithm for Computing BFS-Based Traversals of Large-Scale RDF Graphs

2016

View full text Add to dashboard Cite

Nowadays, a leading instance of big data is represented by Web data that lead to the definition of so-called big Web data. Indeed, extending beyond to a large number of critical applications (e.g., Web advertisement), these data expose several characteristics that clearly adhere to the well-known 3V properties (i.e., volume, velocity, variety). Resource Description Framework (RDF) is a significant formalism and language for the so-called Semantic Web, due to the fact that a very wide family of Web entities can be naturally modeled in a graph-shaped manner. In this context, RDF graphs play a first-class role, because they are widely used in the context of modern Web applications and systems, including the emerging context of social networks. When RDF graphs are defined on top of big (Web) data, they lead to the so-called large-scale RDF graphs, which reasonably populate the next-generation Semantic Web. In order to process such kind of big data, MapReduce, an open source computational framework specifically tailored to big data processing, has emerged during the last years as the reference implementation for this critical setting. In line with this trend, in this paper, we present an approach for efficiently implementing traversals of large-scale RDF graphs over MapReduce that is based on the Breadth First Search (BFS) strategy for visiting (RDF) graphs to be decomposed and processed according to the MapReduce framework. We demonstrate how such implementation speeds-up the analysis of RDF graphs with respect to competitor approaches. Experimental results clearly support our contributions.

show abstract

“…By designing of map reduce programming achieve high performance distributed processing and deals with hardware failure. Hadoop distributed file system is an efficient way to store data [19]. Here, master node splits the input data set into sub problems and distribute into the worker nodes, it process smaller problem in parallel manner and give back to the master node, then master node combines all sub problems at that instant perform answer to form output [20].…”

Section: Introductionmentioning

confidence: 99%

Efficient DANNLO Classifier for Multi-class Imbalanced Data on Hadoop

2017

IJMTER

View full text Add to dashboard Cite

Abstract--In recent years, multi-class imbalance data classification is a major problem in big data. In such situations, we focused on developing a new Deep Artificial Neural Network Learning Optimization (DANNLO) Classifier for large collection of imbalanced data.In our proposed work, first the dataset reduction using principal component analysis for dimensionality reduction and initial centroid is computed.Then, parallel hierarchical Pillar k-means clustering algorithm based on MapReduceis usedto partitioning of an imbalanced data set into similar subset, which can improve the computational cost. The resultant clusters are given as input to the deep ANN for learning. In the next stage, deep neural network has been trained using the back propagation algorithm. In order to optimize the n-dimensional weight space, firefly optimization algorithm is used. Attractiveness and distance of each firefly is computed. Hadoop is used to handle these large volumes of variable size data. Imbalanced datasets is taken from ECDC (European Centre for Disease Prevention and Control) repository.The experimental results illustrated that the proposed method can significantly improve the effectiveness in classifying imbalanced data based on TP rate, F-measure, G-mean measures, confusion matrix, precision, recall, and ROC. The experimental results suggests that DANNLO classifier exceed other ordinary classifiers such as SVM and Random forest classifier on tested imbalanced data sets.

show abstract

Efficient big data processing in Hadoop MapReduce

Cited by 220 publications

References 23 publications

Applications of big data to smart cities

Applications of big data to smart cities

An Effective and Efficient MapReduce Algorithm for Computing BFS-Based Traversals of Large-Scale RDF Graphs

Efficient DANNLO Classifier for Multi-class Imbalanced Data on Hadoop

Contact Info

Product

Resources

About