2013 IEEE International Conference on Big Data 2013
DOI: 10.1109/bigdata.2013.6691724
|View full text |Cite
|
Sign up to set email alerts
|

Rethinking data management for big data scientific workflows

Abstract: Scientific workflows consist of tasks that operate on input data to generate new data products that are used by subsequent tasks. Workflow management systems typically stage data to computational sites before invoking the necessary computations. In some cases data may be accessed using remote I/O. There are limitations with these approaches, however. First, the storage at a computational site may be limited and not able to accommodate the necessary input and intermediate data. Second, even if there is enough s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
24
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
4
2
2

Relationship

2
6

Authors

Journals

citations
Cited by 30 publications
(24 citation statements)
references
References 33 publications
0
24
0
Order By: Relevance
“…However, it does not support raw data file access. Similarly, current solutions for provenance, such as Pegasus Lite [11] and Wings [22], do not address dataflow queries as we show in this paper. Chiron implements a datacentric workflow algebra.…”
Section: Dataflow Management At the Logical Levelmentioning
confidence: 81%
“…However, it does not support raw data file access. Similarly, current solutions for provenance, such as Pegasus Lite [11] and Wings [22], do not address dataflow queries as we show in this paper. Chiron implements a datacentric workflow algebra.…”
Section: Dataflow Management At the Logical Levelmentioning
confidence: 81%
“…The PegasusLite [46] workflow engine was developed for the case in which jobs are executed in a non-shared filesystem en-vironment. In such deployments, the worker nodes on a cluster do not share a file system between themselves or between them and the data staging server.…”
Section: Non-shared File System Execution Enginementioning
confidence: 99%
“…[17] A task level challenge for workflow management systems is to develop a flexible data management solution that allows for late binding of data. Tasks can discover input data at runtime, and possibly choose to stage the data from one of many locations.…”
Section: Instantiation Timementioning
confidence: 99%