Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing 2010
DOI: 10.1145/1851476.1851585
|View full text |Cite
|
Sign up to set email alerts
|

File-access patterns of data-intensive workflow applications and their implications to distributed filesystems

Abstract: This paper studies five real-world data intensive workflow applications in the fields of natural language processing, astronomy image analysis, and web data analysis. Data intensive workflows are increasingly becoming important applications for cluster and Grid environments. They open new challenges to various components of workflow execution environments including job dispatchers, schedulers, file systems, and file staging tools. The keys to achieving high performance are efficient data sharing among executin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2011
2011
2020
2020

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 26 publications
(12 citation statements)
references
References 28 publications
(33 reference statements)
0
12
0
Order By: Relevance
“…To this end, this section focuses on pipeline, reduce, and broadcast patterns ( Figure 2). These are among the most used patterns uncovered by studying over 20 scientific workflow applications by Wozniak et al [20], Shibata et al [15], and Bharathi et al [8]). We note that, these benchmarks are designed to explore the limitations of the predictor and be a worst case in terms of accuracy , and are harder to predict accurately than real applications as they are composed exclusively of I/O operations, which leads to contention in the real storage system.…”
Section: Synthetic Benchmarks For Workflow Patternsmentioning
confidence: 89%
See 1 more Smart Citation
“…To this end, this section focuses on pipeline, reduce, and broadcast patterns ( Figure 2). These are among the most used patterns uncovered by studying over 20 scientific workflow applications by Wozniak et al [20], Shibata et al [15], and Bharathi et al [8]). We note that, these benchmarks are designed to explore the limitations of the predictor and be a worst case in terms of accuracy , and are harder to predict accurately than real applications as they are composed exclusively of I/O operations, which leads to contention in the real storage system.…”
Section: Synthetic Benchmarks For Workflow Patternsmentioning
confidence: 89%
“…Assembling workflow applications by putting together standalone binaries has become a popular approach to support large-scale science [8,15,20]. The processes spawned from these binaries communicate via temporary files stored on a shared storage system.…”
Section: Introductionmentioning
confidence: 99%
“…Not unlike many software applications, workflows involve several, often disparate systems or components which individually analyze input data, make specific calculations, and produce output data which is often then used by another component in the workflow [8]. Shibata presents a study of five workflow applications and their associated costs [12]. Deelman studied costs of scientific workflows running in the cloud [6], and Dun profiled data intensive workflows in a similar way to the detection approach for measurement covered in the current work [7].…”
Section: Related Workmentioning
confidence: 99%
“…(a) hardware design or architectural improvements [2], [3], (b) new storage and memory management techniques [4], [5], and (c) algorithms to optimize dataintensive applications [6], [7]. The science of workflows has emerged to simplify the complex scientific processes by step-wise representation in the form of workflows [8].…”
Section: Introductionmentioning
confidence: 99%