Big data clustering with varied density based on MapReduce

Heidari, Safanaz; Alborzi, Mahmood; Radfar, Reza; Afsharkazemi, Mohammad Ali; Ghatari, Ali Rajabzadeh

doi:10.1186/s40537-019-0236-x

Cited by 34 publications

(14 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It aims to analyze pre-processed stored data in order to find correlations, identify patterns and create actionable insights. There are mainly four categories through which Big Data analysis could be designed and conducted: prescriptive, predictive, diagnostic and descriptive [32,36,[52][53][54][55][56]. In next, we describe each of these categories:…”

Section: Discussionmentioning

confidence: 99%

Big data monetization throughout Big Data Value Chain: a comprehensive review

et al. 2020

View full text Add to dashboard Cite

Value creation is the rational goal of any organization. However, in a changing world, in terms of data, it is very difficult to find strategies to sustainably generate new value and improve operations. In this respect, generating value may mean revisiting some deeplyheld conceptions of different processes. For this, each organization has to ensure an efficient and updated management of value creation processes for both inter-organization and partners. Value chain (VC) is one of the key solutions to such dilemmas.

show abstract

Section: Discussionmentioning

confidence: 99%

Big data monetization throughout Big Data Value Chain: a comprehensive review

et al. 2020

View full text Add to dashboard Cite

show abstract

“…To estimate the runtime of a job in Hadoop MapReduce, first, we investigated the anatomy of Hadoop's job and the stages of running a job precisely [1][2][3][4][5]. Since Hadoop works on the repetitive application on the same data type [17], we use the profiling method, which means that there is some separate table in the database for each application.…”

Section: Methodsmentioning

confidence: 99%

“…Therefore, the time of each stage must be calculated and summated. Since Hadoop run on a distributed system, many factors and parameters affect T map and T reduce [1][2][3][4][5]22]. So, we investigate the parameters with more impact on runtime.…”

Section: Estimating Runtime For the First Runmentioning

confidence: 99%

“…According to the Hadoop MapReduce operation [1][2][3][4][5], when the request to execute a job is issuing, the input file breaks as predefined blocks (in Hadoop, version 2, the size of each block or split is by default 128 MB) and divided up into nodes, each block is called a task or a map. The number of maps determines by dividing the input data size by the size of the data block (Eq.…”

Section: Parameters Affecting the Runtime In Hadoopmentioning

confidence: 99%

See 1 more Smart Citation

Estimating runtime of a job in Hadoop MapReduce

Peyravi

Moeini

2020

J Big Data

View full text Add to dashboard Cite

Nowadays, with the emergence and use of new systems, we face a massive amount of data. Due to the volume, velocity, and variety of these big data, managing, maintaining, and processing them require special infrastructures. One of the famous open-source frameworks is Apache Hadoop [1]. It is a scalable and reliable framework for storage and process big data. Hadoop divides the big input data into fixed-size pieces; stores and processes these split of data on a cluster of machines. By default, each split copies three times and transfer to different machines to manage errors and fault tolerance. Hadoop stores its data in a distributed file system called HDFS. MapReduce is designed to work with HDFS. MapReduce is the programming model that allows Hadoop to efficiently process large amounts of data in the cluster's nodes. [1-5]. Since one of Hadoop's most important tasks is managing jobs and resources, better management will be done if estimation and prediction the runtime of a job do precisely. Also, because of limited critical resources like CPU, I/O, and memory, this issue is important in many aspects like efficient scheduling, better energy consumption, bottleneck detection, and resource management [3-5].

show abstract

“…These models can be applied for evaluating several methods in Map-Reduce. Heidari et al [26] discussed clustering with variable density based on huge data. They presented MR-VDBSCAN in this method.…”

Section: Related Workmentioning

confidence: 99%

Decreasing the execution time of reducers by revising clustering based on the futuristic greedy approach

Bakhthemmat

Izadi

2020

J Big Data

View full text Add to dashboard Cite

The amount of data generated on the internet grows every day at a high rate. This rate of data generation requires rapid processing. The MapReduce technique is applied for distributed computing of huge data, whose main idea is job parallelization. The MapReduce algorithm deals with two important tasks, namely Map and Reduce. Initially, the Map includes a set of data, which is broken down into tuples (key/value pairs). Secondly, reduce task takes the map output as an input whereby Reducers run the tasks. Job clustering can determine an allocation of jobs to the reducers and mappers. In recent years, this method has been used frequently for job allocation in MapReduce for shortening the execution time of big data processing [1].

show abstract

Big data clustering with varied density based on MapReduce

Cited by 34 publications

References 27 publications

Big data monetization throughout Big Data Value Chain: a comprehensive review

Big data monetization throughout Big Data Value Chain: a comprehensive review

Estimating runtime of a job in Hadoop MapReduce

Decreasing the execution time of reducers by revising clustering based on the futuristic greedy approach

Contact Info

Product

Resources

About