2020
DOI: 10.1016/j.future.2020.07.007
|View full text |Cite
|
Sign up to set email alerts
|

A programming model for Hybrid Workflows: Combining task-based workflows and dataflows all-in-one

Abstract: In the past years, e-Science applications have evolved from large-scale simulations executed in a single cluster to more complex workflows where these simulations are combined with High-Performance Data Analytics (HPDA). To implement these workflows, developers are currently using different patterns; mainly task-based and dataflow. However, since these patterns are usually man

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
1
1

Relationship

2
4

Authors

Journals

citations
Cited by 12 publications
(11 citation statements)
references
References 13 publications
0
9
0
Order By: Relevance
“…However, various constraints in business processes and workflows often depend on the correctness of data. If the control flow is correct, then the data flow may not be correct; therefore, it is very important to analyze and restore the data flow in business processes and workflows [6]. Moreover, there is less research on data flow recovery.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…However, various constraints in business processes and workflows often depend on the correctness of data. If the control flow is correct, then the data flow may not be correct; therefore, it is very important to analyze and restore the data flow in business processes and workflows [6]. Moreover, there is less research on data flow recovery.…”
Section: Related Workmentioning
confidence: 99%
“…Liu et al [12] have proposed a Petri net-based approach to model and analyze data flows. Ramon-Cortes et al [6] build a Distributed Stream Library supporting the integration of workflow and data flow to meet the needs of new Data Science workflows. Xiang et al [13] have proposed a PN-DOS model to reduce the accessibility of graphs to quickly detect data flow errors and ensure the correctness of business processes.…”
Section: Application Of Data Flowmentioning
confidence: 99%
“…However, developers frequently use the stream-oriented low-level frameworks, such as Apache Kafka [25], or dataflow models like Apache Storm [32], Apache Spark-Streaming [36] or Heron [26]. Apache Beam [4], COMPSs [30] and Twister2 [24] have gone a step further aiming to merge both workflows and dataflows in one single solution.…”
Section: Related Workmentioning
confidence: 99%
“…Converting the complex logic of the computation into an hybrid workflow -supporting both atomic and continuous-processing tasks -allows to parallelize and distribute the workload across the whole platform. For testing purposes and without loss of generality, our prototype leverages on COMPSs [27,30] to make this conversion, and we modified the COMPSs runtime to delegate the execution of the nested tasks onto Colony. Nevertheless, other programming models following a task-based approach, such as Swift [34] or Kepler [15], could also be integrated in the framework with the appropriate glue software.…”
Section: Introductionmentioning
confidence: 99%
“…Extensions of task-based programming to distributed programming, such as PyCOMPSs [23], [24], Dask [25], Ray [26], Parsl [27], and Pygion [28] are gaining popularity for scientific data analysis for the mix of performance and simplicity they offer. They provide a Python interface and often the transparent parallelization of some classical APIs (or part of them) like Numpy or Pandas.…”
Section: Related Workmentioning
confidence: 99%