Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing 2009
DOI: 10.1145/1551609.1551642
|View full text |Cite
|
Sign up to set email alerts
|

The quest for scalable support of data-intensive workloads in distributed systems

Abstract: Data-intensive applications involving the analysis of large datasets often require large amounts of compute and storage resources, for which data locality can be crucial to high throughput and performance. We propose a "data diffusion" approach that acquires compute and storage resources dynamically, replicates data in response to demand, and schedules computations close to data. As demand increases, more resources are acquired, thus allowing faster response to subsequent requests that refer to the same data; … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
43
0

Year Published

2010
2010
2019
2019

Publication Types

Select...
5
3
2

Relationship

6
4

Authors

Journals

citations
Cited by 43 publications
(46 citation statements)
references
References 21 publications
0
43
0
Order By: Relevance
“…FusionFS is optimized for a subset of HPC and many-task computing (MTC) [12,59,62,63] workloads, and it is designed for extreme scales [61]. These workloads are often extremely data-intensive [56,58,60], and optimizing data locality [55] becomes critical to achieving good scalability and performance. In FusionFS, every compute node serves all three roles: client, metadata server, and storage server.…”
Section: A Fusionfs: Distributed Metadata Managementmentioning
confidence: 99%
“…FusionFS is optimized for a subset of HPC and many-task computing (MTC) [12,59,62,63] workloads, and it is designed for extreme scales [61]. These workloads are often extremely data-intensive [56,58,60], and optimizing data locality [55] becomes critical to achieving good scalability and performance. In FusionFS, every compute node serves all three roles: client, metadata server, and storage server.…”
Section: A Fusionfs: Distributed Metadata Managementmentioning
confidence: 99%
“…In order to measure efficiency, we investigated the largest available trace of real MTC workloads [38], and filtered out the logs to isolate only the sub-second tasks, which netted about 2.07M tasks with the runtime range of 1 milliseconds to 1 seconds. The tasks were submitted in a random fashion.…”
Section: ) Heterogeneous Workloadsmentioning
confidence: 99%
“…TABLE III To the best of our knowledge, HyCache is the first user-level POSIX-compliant hybrid caching for distributed file systems. Some of our previous work [15][16][17] proposed data caching to accelerate applications by modifying the applications and/or their workflow, rather than the at the filesystem level. Other existing work requires modifying OS kernel, or lacks of a systematic caching mechanism for manipulating files across multiple storage devices, or does not support the POSIX interface.…”
Section: Applicationmentioning
confidence: 99%