Proceedings of the 2021 on Systems and Network Telemetry and Analytics 2020
DOI: 10.1145/3452411.3464441
|View full text |Cite
|
Sign up to set email alerts
|

Analyzing Scientific Data Sharing Patterns for In-network Data Caching

Abstract: The volume of data moving through a network increases with new scientific experiments and simulations. Network bandwidth requirements also increase proportionally to deliver data within a certain time frame. We observe that a significant portion of the popular dataset is transferred multiple times to different users as well as to the same user for various reasons. In-network data caching for the shared data has shown to reduce the redundant data transfers and consequently save network traffic volume. In additi… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 18 publications
0
4
0
Order By: Relevance
“…The caching approach improves overall application performance by decreasing data access latency and increasing data access throughput. It also reduces traffic over the wide-area network by decreasing the number of repeated data transfers [10][11][12].…”
Section: Introductionmentioning
confidence: 99%
“…The caching approach improves overall application performance by decreasing data access latency and increasing data access throughput. It also reduces traffic over the wide-area network by decreasing the number of repeated data transfers [10][11][12].…”
Section: Introductionmentioning
confidence: 99%
“…To take advantage of this resue, the High-Energy Physics (HEP) community has established a number of regional storage caches [6,7,13]. Analyses show that these caches could significantly reduce the data access latency as well as the traffic on the internet backbone [4].…”
Section: Introductionmentioning
confidence: 99%
“…Adding more cache nodes to an already full distributed cache invariably leads to skewed distributions of data access patterns. This happened around Aug. 26, 2021 when 7 new nodes at Caltech (xrd [3][4][5][6][7][8]11) are added to the system, and around Sep. 30, 2021 when 2 new nodes at Caltech (xrd 9-10) are added to the system. The new cache nodes get the new data.…”
Section: Introductionmentioning
confidence: 99%
“…As a first step, this can be set independently on any machine that should cache data [138]. Furthermore, it can also be used to coordinate multiple caching nodes forming a federated system (a technology called XCache commonly used in grid sites [139,140]). Although this caching system is common enough in HEP computing environments, not often can it be found used directly by the analysis framework, whereas usually it is activated at the level of the grid site.…”
Section: State Of the Artmentioning
confidence: 99%