2012
DOI: 10.14778/2367502.2367562
|View full text |Cite
|
Sign up to set email alerts
|

Efficient big data processing in Hadoop MapReduce

Abstract: This tutorial is motivated by the clear need of many organizations, companies, and researchers to deal with big data volumes efficiently. Examples include web analytics applications, scientific applications, and social networks. A popular data processing engine for big data is Hadoop MapReduce. Early versions of Hadoop MapReduce suffered from severe performance problems. Today, this is becoming history. There are many techniques that can be used with Hadoop MapReduce jobs to boost performance by orders of magn… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
88
0
3

Year Published

2013
2013
2023
2023

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 220 publications
(100 citation statements)
references
References 23 publications
0
88
0
3
Order By: Relevance
“…The software platforms for smart cities should offer high performance computing capabilities, be optimized for the hardware being used, is stable and reliable for the different data-intensive applications being executed, supports stream processing, provides a high-levels of fault resilience, and is supported by a well-trained and capable team and vendor. There are different available software platforms for big data analytics such as Hadoop Mapreduce [28], HPCC [29], Stratosphere [30], and IBM Infosphere Streams [31], which provide the stream processing required by real-time big data applications such as intelligent transportations in a smart city [19]. These platforms work well on cluster systems that can provide a powerful and scalable hardware platform to meet the requirements of big data applications for smart cities.…”
Section: Big Data Managementmentioning
confidence: 99%
“…The software platforms for smart cities should offer high performance computing capabilities, be optimized for the hardware being used, is stable and reliable for the different data-intensive applications being executed, supports stream processing, provides a high-levels of fault resilience, and is supported by a well-trained and capable team and vendor. There are different available software platforms for big data analytics such as Hadoop Mapreduce [28], HPCC [29], Stratosphere [30], and IBM Infosphere Streams [31], which provide the stream processing required by real-time big data applications such as intelligent transportations in a smart city [19]. These platforms work well on cluster systems that can provide a powerful and scalable hardware platform to meet the requirements of big data applications for smart cities.…”
Section: Big Data Managementmentioning
confidence: 99%
“…Paper [50] puts light on main issues and challenges of big data processing over MapReduce, by highlighting actual data management solutions that found over this computational platform. Several aspects are touched, including job optimization, physical data organization, data layouts, indexes, and so forth.…”
Section: Mapreduce Algorithms For Big Data Processingmentioning
confidence: 99%
“…By designing of map reduce programming achieve high performance distributed processing and deals with hardware failure. Hadoop distributed file system is an efficient way to store data [19]. Here, master node splits the input data set into sub problems and distribute into the worker nodes, it process smaller problem in parallel manner and give back to the master node, then master node combines all sub problems at that instant perform answer to form output [20].…”
Section: Introductionmentioning
confidence: 99%