Task Scheduling in Big Data Platforms: A Systematic Literature Review

Soualhia, Mbarka; Khomh, Foutse; Tahar, Sofiène

doi:10.1016/j.jss.2017.09.001

Cited by 40 publications

(25 citation statements)

References 55 publications

(65 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This paper presents a systematic literature review (SLR) on the current state of research associated with big data technologies in manufacturing [37]. To apply big data technologies in manufacturing successfully, it is essential to systematically review the literature of big data technologies in manufacturing from the following three perspectives: manufacturing data, big data technologies and data applications in manufacturing.…”

Section: Methodsmentioning

confidence: 99%

Manufacturing big data ecosystem: A systematic literature review

Cui

Kara

Chan

2020

Robotics and Computer-Integrated Manufacturing

211

View full text Add to dashboard Cite

Advanced manufacturing is one of the core national strategies in the US (AMP), Germany (Industry 4.0) and China (Made-in China 2025). The emergence of the concept of Cyber Physical System (CPS) and big data imperatively enable manufacturing to become smarter and more competitive among nations. Many researchers have proposed new solutions with big data enabling tools for manufacturing applications in three directions: product, production and business.Big data has been a fast-changing research area with many new opportunities for applications in manufacturing. This paper presents a systematic literature review of the state-of-the-art of big data in manufacturing. Six key drivers of big data applications in manufacturing have been identified. The key drivers are system integration, data, prediction, sustainability, resource sharing and hardware.Based on the requirements of manufacturing, nine essential components of big data ecosystem are captured. They are data ingestion, storage, computing, analytics, visualization, management, workflow, infrastructure and security.Several research domains are identified that are driven by available capabilities of big data ecosystem. Five future directions of big data applications in manufacturing are presented from modelling and simulation to realtime big data analytics and cybersecurity.

show abstract

Section: Methodsmentioning

confidence: 99%

Manufacturing big data ecosystem: A systematic literature review

Cui

Kara

Chan

2020

Robotics and Computer-Integrated Manufacturing

211

View full text Add to dashboard Cite

show abstract

“…It is a necessity for an efficient load balancing system to use optimized scheduling algorithms 14 . Soualhia et al 15 have done strong research on task scheduling in the Big Data platform. In this research, they mention that multiple jobs and tasks with different characteristics and different resource demands that are received by the scheduler in the big data cloud platforms and cause an imbalanced load in the system.…”

Section: Related Workmentioning

confidence: 99%

Optimized load balancing in high‐performance computing for big data analytics

Mirtaheri

Grandinetti

2021

Concurrency and Computation

View full text Add to dashboard Cite

New generation application problems in big data and high-performance computing (HPC) areas claim very diverse operational properties. The convergence requires the dynamic behavior of system components. Load balancing is a critical issue in response to the highly unpredictable, dynamic, and data-oriented behavior of the system. Possible practical constraints such as communication and load transfer delays play an essential role in designing a dynamic load balancer. On the other hand, according to most of the new platforms' distributed nature, the load balancer should be able to perform in a fully distributed manner. In this research, we consider practical issues, including different processing power, storage capability, communication, load transfer delays, and propose two distributed and optimized load balancing methods in HPC for Big Data processing. We model the constraints and present an argument named compensating factor for the optimized load balancer. We try to minimize the task execution time by reducing the nodes' idle time. We evaluate the proposed methods in different scenarios by using Monte Carlo. Evaluations results show that proposed methods decrease idle time significantly while being scalable to network size and applicable in heterogeneous networks with dynamic resources and configuration. K E Y W O R D Sbig data, distributed computing, high-performance computing, load balancing, optimization INTRODUCTIONAt present, a large amount of data is being generated exponentially due to the massive number of sensors, the Internet of Things (IoT), and connected devices. There are various big data sources such as social media, black box data, stock exchange data, power grid data, transport data, and search engine data that need a huge amount of processing power almost in real-time scenarios. These types of data and information on this scale require to be managed by runtime tools. High-performance computing (HPC) systems are actual solutions for vast and complex processing. However, traditional HPC tools are not adequate, and runtime tools are needed for big data processing on HPC platforms. 1Big data refers to the emerging technologies designed to extract value from data having at least three Vs. namely, volume, variety, and velocity.We can say that "big data" is a collection of large amounts of information with increasing capacity, stored a huge volume of data that can be structured, semi-structured, unstructured, and time-stamped. The statistical and regression techniques may be used for the analysis of this amount of data. 2One of the most important and challenging tools in HPC platforms is the load balancer. The load balancer divides the processing load among available computing systems so that processing work is performed in the minimum time, considering the given constraints. In load balancing, the

show abstract

“…Reduce phase: In this phase, the reducer job is to process the input data that comes from the mapper by analyzing and merging it to produce the final output which is written to the HDFS in the cluster .Some other programming models such as Spark [44,45] and DataMPI [46] are competing with MapReduce. Table 3 summarizes the big data capabilities and the available primary technologies [5].Since MapReduce is an open source with high performance which is used by many big companies for processing batch jobs [47,48].…”

Section: Hadoop Mapreduce (Programming Paradigm)mentioning

confidence: 99%

Using Hadoop Technology to Overcome Big Data Problems by Choosing Proposed Cost-efficient Scheduler Algorithm for Heterogeneous Hadoop System (BD3)

Hussein

2020

JSRR

View full text Add to dashboard Cite

Day by day advanced web technologies have led to tremendous growth amount of daily data generated volumes. This mountain of huge and spread data sets leads to phenomenon that called big data which is a collection of massive, heterogeneous, unstructured, enormous and complex data sets. Big Data life cycle could be represented as, Collecting (capture), storing, distribute, manipulating, interpreting, analyzing, investigate and visualizing big data. Traditional techniques as Relational Database Management System (RDBMS) couldn’t handle big data because it has its own limitations, so Advancement in computing architecture is required to handle both the data storage requisites and the weighty processing needed to analyze huge volumes and variety of data economically. There are many technologies manipulating a big data, one of them is hadoop. Hadoop could be understand as an open source spread data processing that is one of the prominent and well known solutions to overcome handling big data problem. Apache Hadoop was based on Google File System and Map Reduce programming paradigm. Through this paper we dived to search for all big data characteristics starting from first three V's that have been extended during time through researches to be more than fifty six V's and making comparisons between researchers to reach to best representation and the precise clarification of all big data V’s characteristics. We highlight the challenges that face big data processing and how to overcome these challenges using Hadoop and its use in processing big data sets as a solution for resolving various problems in a distributed cloud based environment. This paper mainly focuses on different components of hadoop like Hive, Pig, and Hbase, etc. Also we institutes absolute description of Hadoop Pros and cons and improvements to face hadoop problems by choosing proposed Cost-efficient Scheduler Algorithm for heterogeneous Hadoop system.

show abstract

Task Scheduling in Big Data Platforms: A Systematic Literature Review

Cited by 40 publications

References 55 publications

Manufacturing big data ecosystem: A systematic literature review

Manufacturing big data ecosystem: A systematic literature review

Optimized load balancing in high‐performance computing for big data analytics

Using Hadoop Technology to Overcome Big Data Problems by Choosing Proposed Cost-efficient Scheduler Algorithm for Heterogeneous Hadoop System (BD3)

Contact Info

Product

Resources

About