Abstract:As data volumes and the need for timely analysis grow, Big Data analytics frameworks have to scale out to hundred or even thousands of commodity servers. While such a scale-out is crucial to sustain desired computational throughput/latency and storage capacity, it comes at the cost of increased network traffic volumes and multiplicity of traffic patterns. Despite the sheer reality of the dependency between datacenter network (DCN) and time-to-insight through big data analysis, our experience as active networki… Show more
“…MaxiNet is capable of emulating large‐scale SDN cloud environments along with evaluating new SDN‐powered routing algorithms. Moreover, to allow Mininet to mimic the behaviors of MapReduce applications, MRemu 32 was introduced where it is capable of using realistic MapReduce workloads/traces within Mininet environments. It operates on latency periods extracted from MapReduce job traces (duration of tasks, waiting times, and so forth).…”
The integration and crosscoordination of big data processing and software‐defined networking (SDN) are vital for improving the performance of big data applications. Various approaches for combining big data and SDN have been investigated by both industry and academia. However, empirical evaluations of solutions that combine big data processing and SDN are extremely costly and complicated. To address the problem of effective evaluation of solutions that combine big data processing with SDN, we present a new, self‐contained simulation tool named BigDataSDNSim that enables the modeling and simulation of the big data management system YARN, its related programming models MapReduce, and SDN‐enabled networks in a cloud computing environment. BigDataSDNSim supports cost‐effective and easy to conduct experimentation in a controllable, repeatable, and configurable manner. The article illustrates the simulation accuracy and correctness of BigDataSDNSim by comparing the behavior and results of a real environment that combines big data processing and SDN with an equivalent simulated environment. Finally, the article presents two uses cases of BigDataSDNSim, which exhibit its practicality and features, illustrate the impact of data replication mechanisms of MapReduce in Hadoop YARN, and show the superiority of SDN over traditional networks to improve the performance of MapReduce applications.
“…MaxiNet is capable of emulating large‐scale SDN cloud environments along with evaluating new SDN‐powered routing algorithms. Moreover, to allow Mininet to mimic the behaviors of MapReduce applications, MRemu 32 was introduced where it is capable of using realistic MapReduce workloads/traces within Mininet environments. It operates on latency periods extracted from MapReduce job traces (duration of tasks, waiting times, and so forth).…”
The integration and crosscoordination of big data processing and software‐defined networking (SDN) are vital for improving the performance of big data applications. Various approaches for combining big data and SDN have been investigated by both industry and academia. However, empirical evaluations of solutions that combine big data processing and SDN are extremely costly and complicated. To address the problem of effective evaluation of solutions that combine big data processing with SDN, we present a new, self‐contained simulation tool named BigDataSDNSim that enables the modeling and simulation of the big data management system YARN, its related programming models MapReduce, and SDN‐enabled networks in a cloud computing environment. BigDataSDNSim supports cost‐effective and easy to conduct experimentation in a controllable, repeatable, and configurable manner. The article illustrates the simulation accuracy and correctness of BigDataSDNSim by comparing the behavior and results of a real environment that combines big data processing and SDN with an equivalent simulated environment. Finally, the article presents two uses cases of BigDataSDNSim, which exhibit its practicality and features, illustrate the impact of data replication mechanisms of MapReduce in Hadoop YARN, and show the superiority of SDN over traditional networks to improve the performance of MapReduce applications.
“…Para a experimentação, utilizamos a ferramenta MRemu [Neves et al 2015], um framework baseado em emulação para pesquisa em rede usando cargas de trabalho MapReduce. MRemu usa traços de rede, criados a partir de execuções de aplicações Hadoop em cluster reais, como base para produzir cargas de tráfego emuladas no Mininet.…”
O crescimento no volume dos dados tem revolucionado os negócios e a ciência ao mesmo tempo que demanda capacidade cada vez maior dos recursos computacionais. As plataformas de computação de alto desempenho (HPC), tradicionalmente empregadas em simulações numéricas massivamente paralelas, oferecem capacidade computacional que pode ser aproveitada na análise de Big Data. No entanto, a convergência de Big Data e HPC deve ser examinada sob vários aspectos; em particular, a infraestrutura de rede precisa ajustar-se a demandas de aplicações bem distintas. O modelo de rede definida por software (SDN) pode favorecer essa convergência, graças à sua visão global da rede e sua programabilidade. Nesse contexto, apresentamos uma plataforma SDN capaz de suprir, de forma convergente, os requisitos de aplicações Big Data e HPC. A plataforma aplica mecanismos de roteamento mais adequados a cada perfil de tráfego, permitindo assim a redução no tempo de execução de aplicações. Demonstramos por meio de simulações a viabilidade de nossa plataforma, ao reduzir o tempo de execução de aplicações reais MPI em cenários específicos em até 11% e Hadoop em até 6%.
“…The main limitation of Mininet is that it does not support application-level infrastructures. To allow Mininet to mimic the behaviors of MapReduce applications, MRemu 16 was introduced where it is capable of using realistic MapReduce workloads/traces within Mininet environments. It operates based on latency periods extracted from MapReduce job traces (duration of tasks, waiting times, etc.).…”
Emerging paradigms of big data and Software-Defined Networking (SDN) in cloud data centers have gained significant attention from industry and academia. The integration and coordination of big data and SDN are required to improve the application and network performance of big data applications. While empirical evaluation and analysis of big data and SDN can be one way of observing proposed solutions, it is often impractical or difficult to apply for several reasons, such as expensive undertakings, time consuming, and complexity; in addition, it is beyond the reach of most individuals. Thus, simulation tools are preferable options for performing costeffective, scalable experimentation in a controllable, repeatable, and configurable manner. To fill this gap, we present a new, self-contained simulation tool named BigDataSDNSim that enables the modeling and simulating of big data management systems (YARN), big data programming models (MapReduce), and SDN-enabled networks within cloud computing environments. To demonstrate the efficacy, effectiveness, and features of BigDataSDNSim, a use-case that compares SDN-enabled networks with legacy networks in terms of the performance and energy consumption is presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.