Greening big data networks: Volume impact

Lawey

et al. 2018

IET Optoelectronics

Self Cite

I IP WDM Wprocessing velocity of big data into two modes: expedited-data processing mode and relaxed-data processing mode. Expedited-data demands higher amount of computational resources to reduce the execution time compared to the relaxed-W M I L P MILP modes at strategic locations, dubbed processing nodes (PNs), built into the network along the path from the data source to the destination. During the processing of big data, the extracted information from the raw traffic is smaller in volume compared to the original big data traffic each time the data is processed, hence, reducing network power consumption. Our results showed that up to 60% network power saving is achieved when nearly 100% of the data in the network required relaxed-processing. In contrast, only 15% of network power saving is gained when nearly 100% of the data required expedited-processing. We obtained around 33% power saving in the mixed modes (i.e., when approximately 50% of the data is processed in the relaxed-mode and 50% of the data is processed in expedited-mode), compared to the classical approach where no PNs exist in the network and all the processing is achieved inside the centralized datacenters only. IntroductionVelocity is data in motion, which is the speed at which data is fluxing in and processed in the data centers [1]. The flux rate can grow larger for applications collecting information from wide spatial or temporal domains. For instance, the Square Kilometre Array [2] telescope combines signals with a flow speed of 700TB/second of data received from thousands of small antennas spread over a distance of more than 3000 km. In another example, five million trade events created each day are scrutinized in real time to identify potential fraud. Five hundred million daily call detail records are analysed in real-time to predict customer churn faster [3].High-speed processing of such immense data volumes as produced by plentiful data sources calls for new processing and communications methodologies in the big data era. In [4] the authors study the minimization of overall cost for Big Data placement, processing, and movement across geodistributed datacenters. In [5], the authors presented an optimization technique to execute a sequence of MapReduce jobs in Geo-distributed DCs to minimize the time and pecuniary cost. The authors in [6] introduced technique to execute MapReduce jobs on multiple IoT nodes to locally process as much data as possible the raw data. The authors in [7] aimed to minimize the communication cost by satisfying as many big data queries as possible over a number of time slots. In-network processing is proposed in [8] to achieve network-awareness to save more bandwidth using custom routing, redundancy elimination and on-path data reduction. In [9], the authors developed a Mixed Integer Linear programing models for energy efficient cloud computing services in IP over WDM core networks.We developed in [10] and [11] MILP models to investigate the impact of the big data's volume, variety, and veracity on gre...

Section: Introductionmentioning

confidence: 99%

Greening big data networks: velocity impact

Al-Salim

Lawey

et al. 2018

IET Optoelectronics

Self Cite

“…Constraint (19) represents the size of Chunks in Gb stored in PN p. Constraint (20) ensures that the total data stored in PN p does not exceed the storage capacity of that PN. H is a large enough unitless number to guarantee that there is no storage capacity limitation at the DCs.…”

Section: Power Consumption Of Optical Switch Installed At Node I N (Wmentioning

confidence: 99%

“…H is a large enough unitless number to guarantee that there is no storage capacity limitation at the DCs. 7) PNs and DCs internal switches and routers constraints (21) Constraint (21) ensures that the total amount of big data traffic between the PNs does not exceed the maximum switching and routing capacity of the internal switches and routers in those PNs. On the other hand, the capacity of the DCs' switches and routers is unlimited, where A is a large enough unitless number to guarantee that there is no capacity limitation at the DCs.…”

Section: Power Consumption Of Optical Switch Installed At Node I N (Wmentioning

confidence: 99%

“…In the veracity dimension, we inspected the impact of cleansing and backup operations of big data on the energy efficient big data networks. In [21] we presented preliminary results that considered big data volume only. The current paper makes a number of new contributions beyond [21]: (i) it considers big data variety for the first time, (ii) it introduces and considers the new software matching problem in big data networks, (iii) compared to [21], it provides the MILP formulation which is not in [21] and a very wide range of results, (iv) it evaluates for the first time the impact of the power efficiency of PNs, (v) it presents our new Energy Efficient Big Data Networks (EEBDN) heuristic and its complexity, (vi) it presents new results under different network topologies.…”

Section: Introductionmentioning

confidence: 99%

“…We referred to such a network as an Energy Efficient Tapered Data Network since there is a significant reduction in the data transported over the network each time the data is processed. Our work in [19] and [20] considered big data velocity and veracity respectively, which are two dimensions of big data that are not related to the current work. In the velocity dimension, we presented several scenarios to process expedited-data and relaxed-data made up of big data Chunks progressively starting at the source nodes of the network, moving through the intermediate nodes of the network and finally processing at the centralized data centers.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Energy Efficient Big Data Networks: Impact of Volume and Variety

Al-Salim

Lawey

IEEE Trans. Netw. Serv. Manage.

et al. 2018

Self Cite

Abstract-In this article, we study the impact of big data's volume and variety dimensions on Energy Efficient Big Data Networks (EEBDN) by developing a Mixed Integer Linear Programming (MILP) model to encapsulate the distinctive features of these two dimensions. Firstly, a progressive energy efficient edge, intermediate, and central processing technique is proposed to process big data's raw traffic by building processing nodes (PNs) in the network along the way from the sources to datacenters. Secondly, we validate the MILP operation by developing a heuristic that mimics, in real time, the behaviour of the MILP for the volume dimension. Thirdly, we test the energy efficiency limits of our green approach under several conditions where PNs are less energy efficient in terms of processing and communication compared to data centers. Fourthly, we test the performance limits in our energy efficient approach by studying a "software matching" problem where different software packages are required to process big data. The results are then compared to the Classical Big Data Networks (CBDN) approach where big data is only processed inside centralized data centers. Our results revealed that up to 52% and 47% power saving can be achieved by the EEBDN approach compared to the CBDN approach, under the impact of volume and variety scenarios, respectively. Moreover, our results identify the limits of the progressive processing approach and in particular the conditions under which the CBDN centralized approach is more appropriate given certain PNs energy efficiency and software availability levels. Index Terms -Big data volume, big data variety, energy efficient networks, IP over WDM core networks, MILP, processing location optimization, software matching.

On the energy efficiency of MapReduce shuffling operations in data centers

Mohamed

2017 19th International Conference on Transparent Optical Networks (ICTON)

Elmirghani

2017

Self Cite

This paper aims to quantitatively measure the impact of different data centers networking topologies on the performance and energy efficiency of shuffling operations in MapReduce. Mixed Integer Linear Programming (MILP) models are utilized to optimize the shuffling in several data center topologies with electronic, hybrid, and all-optical switching while maximizing the throughput and reducing the power consumption. The results indicate that the networking topology has a significant impact on the performance of MapReduce. They also indicate that with comparable performance, optical-based data centers can achieve an average of 54% reduction in the energy consumption when compared to electronic switching data centers. Keywords: Data Center Networking (DCN), MapReduce, energy efficiency, completion time. INTRODUCTIONThe MapReduce programming model and its widely-used platform, Hadoop, are enabling several costeffective cloud-based big data services [1]. These services typically require extensive all-to-all communications between hosting servers leading to increased congestion and power consumption in data centers. Moreover, they result in the East-West traffic dominating over the South-North traffic. This new traffic trend has become the focus in designing state-of-art production data centers [2]. These challenges are increasingly motivating the consideration of all-optical networking in future data centers to cope with the increasing demands of big data applications while improving the data centers performance and decreasing their power consumption [3].The processing in MapReduce is composed of map, shuffle, and reduce phases. The input data is stored in several servers' local disks and is globally managed by a distributed file system (DFS) [1]. The processing starts by assigning map slots according to the number of input data chunks and available computing resources, and reduce slots according to the user's configurations. If chunks are more than map slots, the map phase will run in several waves according to their scheduling [4]. Each map slot processes it assigned chunks, preferably available locally, and generates intermediate results in the form of < key,value> pairs. The intermediate results are shuffled to reduce slots according to their keys where each slot is assigned to process a unique set of keys [1]. Finally, each reduce slot sorts its inputs, calculates final results, and saves them in the DFS.Several optimization studies have been carried out by both academia and industry to enhance the performance and energy efficiency of big data applications (e.g. [2], [4]- [21]). The performance of big data applications and frameworks such as MapReduce is associated with a wide range of factors and parameters such as the cluster specifications (e.g. CPU, memory, networking, and disk I/O resources [9]), framework used or version, in addition to selected configurations and mechanisms for data and jobs placements and tasks scheduling [4]- [8]. Moreover, as the deployments of big data applications are evolvin...