A Scalable Architecture for Cooperative Web Caching

Lancellotti, Riccardo; Ciciani, Bruno; Colajanni, Michele

doi:10.1007/3-540-45745-3_3

Cited by 4 publications

(3 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The approach is reminiscent of cooperative caching [18], cooperative web-caching [19], and peer-to-peer storage systems [17]. (Other data-aware scheduling approaches tend to assume static resources [1,2].)…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Accelerating large-scale data exploration through data diffusion

Raicu

Zhao

Foster

et al. 2008

Proceedings of the 2008 International Workshop on Data-Aware Distributed Computing

View full text Add to dashboard Cite

Data-intensive applications often require exploratory analysis of large datasets. If analysis is performed on distributed resources, data locality can be crucial to high throughput and performance. We propose a "data diffusion" approach that acquires compute and storage resources dynamically, replicates data in response to demand, and schedules computations close to data. As demand increases, more resources are acquired, thus allowing faster response to subsequent requests that refer to the same data; when demand drops, resources are released. This approach can provide the benefits of dedicated hardware without the associated high costs, depending on workload and resource characteristics. The approach is reminiscent of cooperative caching, web-caching, and peer-to-peer storage systems, but addresses different application demands. Other data-aware scheduling approaches assume dedicated resources, which can be expensive and/or inefficient if load varies significantly. To explore the feasibility of the data diffusion approach, we have extended the Falkon resource provisioning and task scheduling system to support data caching and data-aware scheduling. Performance results from both microbenchmarks and a large scale astronomy application demonstrate that our approach improves performance relative to alternative approaches, as well as provides improved scalability as aggregated I/O bandwidth scales linearly with the number of data cache nodes.

show abstract

Section: Introductionmentioning

confidence: 99%

“…Data diffusion thus involves a combination of dynamic resource provisioning, data caching, and data-aware scheduling. The approach is reminiscent of cooperative caching [18], cooperative web-caching [19], and peer-to-peer storage systems [17]. (Other data-aware scheduling approaches tend to assume static resources [1,2].)…”

Section: Introductionmentioning

confidence: 99%

Accelerating large-scale data exploration through data diffusion

Raicu

Zhao

Foster

et al. 2008

Proceedings of the 2008 International Workshop on Data-Aware Distributed Computing

View full text Add to dashboard Cite

show abstract

“…n those systems d the growing ious forms of on, provenance oosely coupled documents, or complexity of rocessing and ting processes At the low end upled Message move into the xample for this ves us into the n [6,7] and n of both many g [9] category ryad [11], and s Data diffusion involves a combination of dynamic resource provisioning, data caching, and dataaware scheduling. The approach is reminiscent of cooperative caching [27], cooperative webcaching [28], and peer-to-peer storage systems [29]. Other data-aware scheduling approaches tend to assume static resources [30,31], in which a system configuration dedicates nodes with roles (i.e.…”

Section: Introductionmentioning

confidence: 99%

Towards Data Intensive Many-Task Computing

Raicu

Foster

Zhao

et al.

Advances in Systems Analysis, Software Engineering, and High Performance Computing

View full text Add to dashboard Cite

Many-task computing aims to bridge the gap between two computing paradigms, high throughput computing and high performance computing. Traditional techniques to support many-task computing commonly found in scientific computing (i.e. the reliance on parallel file systems with static configurations) do not scale to today’s largest systems for data intensive application, as the rate of increase in the number of processors per system is outgrowing the rate of performance increase of parallel file systems. In this chapter, the authors argue that in such circumstances, data locality is critical to the successful and efficient use of large distributed systems for data-intensive applications. They propose a “data diffusion” approach to enable data-intensive many-task computing. They define an abstract model for data diffusion, define and implement scheduling policies with heuristics that optimize real world performance, and develop a competitive online caching eviction policy. They also offer many empirical experiments to explore the benefits of data diffusion, both under static and dynamic resource provisioning, demonstrating approaches that improve both performance and scalability.

show abstract

References

Web Caching and Its Applications

View full text Add to dashboard Cite

A Scalable Architecture for Cooperative Web Caching

Cited by 4 publications

References 15 publications

Accelerating large-scale data exploration through data diffusion

Accelerating large-scale data exploration through data diffusion

Towards Data Intensive Many-Task Computing

References

Contact Info

Product

Resources

About