Runtime Modifications of Spark Data Processing Pipelines

Lazovik, Elena; Medema, Michel; Albers, Toon; Langius, Erik; Lazovik, Alexander

doi:10.1109/iccac.2017.11

Cited by 3 publications

(7 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…First, the dynamic implementation of each scenario takes longer to run than the static Spark implementation. This matches the results of the earlier experiments done for spark-dynamic (Lazovik et al, 2017). The reason for this is that we have added extra functionality on top of the existing static Spark code.…”

Section: Runtime Overheadsupporting

confidence: 87%

“…In a previous work done by the authors (Lazovik et al, 2017), we have investigated the feasibility of dynamically updating the processing pipeline of an Apache Spark application. Apache Spark is one of the most popular big data processing platforms.…”

Section: Spark-dynamicmentioning

confidence: 99%

“…The performance of the prototype was also measured as part of the feasibility study, with promising results (Lazovik et al, 2017). The solutions from this paper are applied on top of this earlier system.…”

Section: Spark-dynamicmentioning

confidence: 99%

“…Next, we compare the adaptive framework with the sparkdynamic framework (Lazovik et al, 2017) and with regular Spark implementations. These experiments are based on three implemented scenarios inspired by real projects, each built using commonly used operations and increasing in complexity.…”

Section: Dynamic Versus Staticmentioning

confidence: 99%

“…In a previous paper, we have developed a framework sparkdynamic (Lazovik et al, 2017), built on top of the popular distributed data processing platform Apache Spark (The Apache Software Foundation, 2015b) to enable the updating of the steps and algorithm parameters of running pipelines without restarting them. This process is called reconfiguration.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Adaptive On-the-Fly Changes in Distributed Processing Pipelines

et al. 2021

Self Cite

View full text Add to dashboard Cite

Distributed data processing systems have become the standard means for big data analytics. These systems are based on processing pipelines where operations on data are performed in a chain of consecutive steps. Normally, the operations performed by these pipelines are set at design time, and any changes to their functionality require the applications to be restarted. This is not always acceptable, for example, when we cannot afford downtime or when a long-running calculation would lose significant progress. The introduction of variation points to distributed processing pipelines allows for on-the-fly updating of individual analysis steps. In this paper, we extend such basic variation point functionality to provide fully automated reconfiguration of the processing steps within a running pipeline through an automated planner. We have enabled pipeline modeling through constraints. Based on these constraints, we not only ensure that configurations are compatible with type but also verify that expected pipeline functionality is achieved. Furthermore, automating the reconfiguration process simplifies its use, in turn allowing users with less development experience to make changes. The system can automatically generate and validate pipeline configurations that achieve a specified goal, selecting from operation definitions available at planning time. It then automatically integrates these configurations into the running pipeline. We verify the system through the testing of a proof-of-concept implementation. The proof of concept also shows promising results when reconfiguration is performed frequently.

show abstract

Section: Runtime Overheadsupporting

confidence: 87%

Section: Spark-dynamicmentioning

confidence: 99%

Section: Spark-dynamicmentioning

confidence: 99%

Section: Dynamic Versus Staticmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Adaptive On-the-Fly Changes in Distributed Processing Pipelines

et al. 2021

Self Cite

View full text Add to dashboard Cite

show abstract

Data Analytics Framework Based on Cloud Environment

Kanagaraj¹,

Geetha²

2021

Integration of Cloud Computing With Internet of Things

View full text Add to dashboard Cite

TGMISC201002 : línea de productos de software dinámica para la construcción de aplicaciones sensibles al contexto

Laverde¹,

Chacón²

View full text Add to dashboard Cite

Las líneas dinámicas de productos de software representan una forma para que los desarrolladores de software exploten características comunes y variables entre un conjunto de requisitos y así construir familias enteras de productos lo cual permite cambiar de una configuración de producto a otra en tiempo de ejecución. Estas son líneas de productos donde la derivación ocurre en tiempo de ejecución e implica una reconfiguración tanto en términos de los servicios disponibles como en la plataforma subyacente. Por otro lado, la nube ha permitido a los desarrolladores crear aplicaciones que se pueden reconfigurar y volver a implementar de forma dinámica y autónoma, independientemente de la infraestructura de hardware física subyacente. Estas dos estrategias combinadas tienen el potencial de construir aplicaciones de software altamente reutilizables y reconfigurables. En este documento presentamos un enfoque para lograr un DSPL utilizando microservicios. Proponemos dos procesos de derivación diferentes, uno en tiempo de diseño basado en el reemplazo de binarios, y otro en tiempo de ejecución que utiliza un modelo de características para el contexto del usuario y la adaptación basada en servicios modulares independientes.

show abstract

Runtime Modifications of Spark Data Processing Pipelines

Cited by 3 publications

References 21 publications

Adaptive On-the-Fly Changes in Distributed Processing Pipelines

Adaptive On-the-Fly Changes in Distributed Processing Pipelines

Data Analytics Framework Based on Cloud Environment

TGMISC201002 : línea de productos de software dinámica para la construcción de aplicaciones sensibles al contexto

Contact Info

Product

Resources

About