Survey on Task Assignment Techniques in Hadoop

Patil, S. T.; Deshmukh, Shyam

doi:10.5120/9617-4256

Cited by 5 publications

(3 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition, numerous of scheduling algorithms are coming into being. For example, Delay Scheduler [5], which is based on enhancing data locality; Dynamic Proportional Scheduler [6], which is based on user preferences and changing the way of task allocation proportion and dynamic priority; Constraint-Based Scheduler [7], which is based on deadline of Real-time; [8] proposed a scheduling algorithm which considers when the task scheduler can't choose the Data-local task whether it allows to assign the Non-local task or not; [9] proposed a scheduling algorithm which is based on the number of Map task node and data sheet replication mode; LATE [10] (Longest Approximate Time to End) and SAMR [11] (Self-adaptive MapReduce Scheduling Algorithm), which are based on under heterogeneous environment how to improve the scheduling efficiency; [12] and [13] proposed a scheduling algorithm which are based on job types classifying and the load dynamic of heterogeneous; [14] proposed a scheduling algorithm which is based on the adaptive node capacity; [15] proposed a scheduling algorithm which is based on matching rules; beyond above, the scheduler based on the intelligent algorithm and simulated annealing algorithm [16] and artificial fish algorithm [17] and genetic algorithm [18]; etc. however all of these scheduling algorithm don't consider from the resources allocation model, their task scheduling strategy is to distinguish the slot type.…”

Section: Related Workmentioning

confidence: 99%

A Hadoop Job Scheduling Model Based on Uncategorized Slot

Xue¹,

Li²

2015

jcm

View full text Add to dashboard Cite

Job scheduling is becoming an important part of Hadoop framework at present, a job scheduling model based on uncategorized slot was researched on this paper. It could eliminate the limitation of Job Task type and didn't distinguish between Map slot and Reduce slot any more, however there was only one type of slot left which could be assigned to execute the Map tasks and to run the Reduce tasks. By adopting Reduce dynamic partitioning, it can realize switching smoothly the slot between two types of tasks, meanwhile, compared with the FIFO algorithm which need distinguish the type of slot, the experimental result shows that the model not only improves the resources utilization and betters load balancing, but also enhances the parallelism of tasks and shortens the execution time of the Job Tasks.

show abstract

Section: Related Workmentioning

confidence: 99%

A Hadoop Job Scheduling Model Based on Uncategorized Slot

Xue¹,

Li²

2015

jcm

View full text Add to dashboard Cite

show abstract

“…To the best of our knowledge, there is no published literature that clearly articulates the problem of scheduling in big data frameworks and provides a research taxonomy for succinct classification of the existing scheduling techniques in Hadoop, Spark, Storm, and Mesos frameworks. Previous efforts [6], [7] [8] that attempted to provide a comprehensive review of scheduling issues in big data platforms were limited to Hadoop only. Moreover, they did not include all papers that were published during the periods covered by their studies (i.e., 2012 and 2015).…”

Section: Introductionmentioning

confidence: 99%

Task Scheduling in Big Data Platforms: A Systematic Literature Review

Soualhia

Khomh

Tahar

2017

Journal of Systems and Software

View full text Add to dashboard Cite

Context: Hadoop, Spark, Storm, and Mesos are very well known frameworks in both research and industrial communities that allow expressing and processing distributed computations on massive amounts of data. Multiple scheduling algorithms have been proposed to ensure that short interactive jobs, large batch jobs, and guaranteed-capacity production jobs running on these frameworks can deliver results quickly while maintaining a high throughput. However, only a few works have examined the effectiveness of these algorithms. Objective: The Evidence-based Software Engineering (EBSE) paradigm and its core tool, i.e., the Systematic Literature Review (SLR), have been introduced to the Software Engineering community in 2004 to help researchers systematically and objectively gather and aggregate research evidences about different topics. In this paper, we conduct a SLR of task scheduling algorithms that have been proposed for big data platforms. Method: We analyse the design decisions of different scheduling models proposed in the literature for Hadoop, Spark, Storm, and Mesos over the period between 2005 and 2016. We provide a research taxonomy for succinct classification of these scheduling models. We also compare the algorithms in terms of performance, resources utilization, and failure recovery mechanisms. Results: Our searches identifies 586 studies from journals, conferences and workshops having the highest quality in this field. This SLR reports about different types of scheduling models (dynamic, constrained, and adaptive) and the main motivations behind them (including data locality, workload balancing, resources utilization, and energy efficiency). A discussion of some open issues and future challenges pertaining to improving the current studies is provided.

show abstract

“…Furthermore, no studies have succinctly discussed the Hadoop scheduling problem and provide a research taxonomy for classifying existing scheduling techniques. Early efforts [5][6][7] to conduct a detailed study of Hadoop platform scheduling problems were limited in scope.…”

mentioning

confidence: 99%

Factors affecting cloud data-center efficiency: a scheduling algorithm-based analysis

Shehloo¹,

Butt²,

Zaman³

2021

IJATEE

View full text Add to dashboard Cite

Nowadays, users are required to cache, scrutinise, and process massive datasets from various fields, including science, business, and research. As a result, they require data-intensive platforms with ample storage and processing power. In addition, many of these kinds of platforms must-have features like parallel processing, fault tolerance, data dissemination, scalability, availability, and load balancing. Google developed the MapReduce programming paradigm to counter this problem, which served as the foundation for Apache's open-source Hadoop project. *Author for correspondence Hadoop relies upon a particular file system designated as HDFS, analogous to Google's File-System (GFS). It splits the massive data into equally sized segments and then places them across multiple nodes in a Hadoop cluster [1]. As a result, Hadoop is now widely accepted as a data analytics model [2]. Hadoop's fundamental operating principle is that "moving computation to data is less expensive than moving data to computation." As a result, Hadoop tries to schedule tasks on local data nodes to minimise network traffic [3]. Task scheduling is critical in Hadoop because it significantly impacts the framework's computation time and, thus, its overall performance [4]. However, given the dynamic nature of the cloud environment, proposing an effective task scheduling strategy is a constant challenge. Nevertheless, only a few studies have analyzed the proposed techniques and their overall effect on the Hadoop framework's

show abstract

Survey on Task Assignment Techniques in Hadoop

Cited by 5 publications

References 5 publications

A Hadoop Job Scheduling Model Based on Uncategorized Slot

A Hadoop Job Scheduling Model Based on Uncategorized Slot

Task Scheduling in Big Data Platforms: A Systematic Literature Review

Factors affecting cloud data-center efficiency: a scheduling algorithm-based analysis

Contact Info

Product

Resources

About