2018
DOI: 10.1016/j.future.2018.05.051
|View full text |Cite
|
Sign up to set email alerts
|

Transferring a petabyte in a day

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
12
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 33 publications
(12 citation statements)
references
References 16 publications
0
12
0
Order By: Relevance
“…Because storage-storage and storage-WAN transfers tend to be nonblocking (e.g., Globus transfers are "fire and forget," burst buffer staging happens asynchronously, and HPSS data movement is managed through a batch queue at NERSC), we hypothesize that users may have less incentive to follow best practices for high-performance parallel I/O and instead initiate small-file transfers that are known to cause suboptimal endto-end performance [5], [23].…”
Section: Site-wide Transfer Behaviormentioning
confidence: 99%
See 1 more Smart Citation
“…Because storage-storage and storage-WAN transfers tend to be nonblocking (e.g., Globus transfers are "fire and forget," burst buffer staging happens asynchronously, and HPSS data movement is managed through a batch queue at NERSC), we hypothesize that users may have less incentive to follow best practices for high-performance parallel I/O and instead initiate small-file transfers that are known to cause suboptimal endto-end performance [5], [23].…”
Section: Site-wide Transfer Behaviormentioning
confidence: 99%
“…High-performance computing (HPC) has historically been dominated by modeling and simulation workflows whose I/O needs are largely driven by checkpoint/restart. However, the role of HPC is rapidly expanding to include large-scale data analysis as a result of both increased data generation rates from modern scientific instruments and the emergence of artificial intelligence as a technique to rapidly extract insight from large volumes of data [1]- [5]. Hence, characterizing the requirements and performance of I/O in modern HPC centers now requires a more holistic examination of data movement.…”
Section: Introductionmentioning
confidence: 99%
“…Exascale systems are expected to generate an unprecedented amount of data in the upcoming decade. For example, cosmology applications such as HACC (Habib et al, 2013) generates 40 PB of data during a single trillion-particle simulation (Kettimuthu et al, 2018). The Community Earth System Model (CESM) which simulates the Earth’s past, present, and future climate states, produces 2.5 PB of raw data, which further leads to an estimate of about 12 PB of raw data output in CMIP6 (Paul et al, 2015).…”
Section: Introductionmentioning
confidence: 99%
“…In particular, data access and sharing has become more important for many HPC applications in the big data era. For example, a single cosmology application generates 20 PB of data, which is exchanged among dispersed computing facilities [1]. Identifying I/O and network bottlenecks should thus be one of the first-class requirements to improve efficiency and scalability in HPC systems.…”
Section: Introductionmentioning
confidence: 99%
“…Rather than simply relying on machine learning, we manually analyzed the log data to find out potential behavioral correlations that could cause performance bottlenecks. We present our initial observations from the analysis of one-week (January 1-6, 2018) HPC log data sets collected from a NERSC Cori HPC system 1 . The data set in this study includes the metadata server CPU load, file system logs sampled once every 5 seconds, and TCP connection records.…”
Section: Introductionmentioning
confidence: 99%