Hanul Sung scite author profile

In high-performance computing (HPC) environments, an appropriate amount of hardware resources must be used for the best parallel I/O performance. For this reason, HPC users are provided with tunable parameters to change the HPC configurations, which control the amounts of resources. However, some users are not well aware of a relationship between the parallel I/O performance and the HPC configuration, and they thus fail to utilize these parameters. Even if users who know the relationship, they have to run an application under every parameter combination to find the setting for the best performance, because each application shows different performance trends under different configurations. The paper shows the result of analyzing the I/O performance trends for HPC users to find the best configurations with minimal efforts. We divide the parallel I/O characteristic into independent and collective I/Os and measure the I/O throughput under various configurations by using synthetic workload, IOR benchmark. Through the analysis, we have figured out that the parallel I/O performance is determined by the trade-off between the gain from the parallelism of increased OSTs and the loss from the contention for shared resources. Also, this performance trend differs depending on the I/O characteristic. Our evaluation shows that HPC applications also have similar performance trends as our analysis. CCS CONCEPTS• Computer systems organization → Real-time operating systems; • Software and its engineering → Software development methods;

show abstract

OMBM-ML: efficient memory bandwidth management for ensuring QoS and improving server utilization

Sung

Min

Koo

et al. 2021

Cluster Comput

View full text Add to dashboard Cite

BBOS: Efficient HPC Storage Management via Burst Buffer Over-Subscription

Sung

Bang

Kim

et al. 2020

View full text Add to dashboard Cite

To avoid access to PFS, dedicated BB allocation is preferred despite of severe BB underutilization. Recently, new all-flash HPC storage systems with integrated BB and PFS are proposed, which speed up access to PFS. For this reason, we adopt BB over-subscription allocation method by allowing HPC applications to use BB only for I/O phase for improving BB utilization. Unfortunately, BB over-subscription aggravates I/O interference and demotion overhead from BB to PFS, resulting in degraded performance. To minimize the performance degradation, we develop an I/O scheduler to prevent I/O congestion and a new transparent data management system based on checkpoint/restart characteristics of HPC applications. With the proposed approach, not only the BB utilization can be improved, but also high performance of applications is achieved.In our experiments, we find that BB utilization is improved at least 2.2x, and more stable and higher checkpoint performance is guaranteed compared to other approaches. Besides, we achieve up to 96.4% hit ratio of restart requests on BB and up to 3.1x higher restart performance than others.

show abstract

Concurrent and Robust End-to-End Data Integrity Verification Scheme for Flash-Based Storage Devices

et al. 2022

View full text Add to dashboard Cite

The amount of data generated by scientific applications on high-performance computing systems is growing at an ever-increasing pace. Most of the generated data are transferred to storage in remote systems for various purposes such as backup, replication, or analysis. To detect data corruption caused by network or storage failures during data transfer, the receiver system verifies data integrity by comparing the checksum of the data. However, the internal operation of the storage device is not sufficiently investigated in the existing end-to-end integrity verification techniques. In this paper, we propose a concurrent and reliable end-to-end data integrity verification scheme considering the internal operation of the storage devices for data transfer between high-performance computing systems with flash-based storage devices. To perform data integrity verification including data corruptions that occurred inside the storage devices, we control the order of I/O operations considering the internal operations of the storage devices. Also, to prove the effectiveness of the proposed scheme, we devise a prototype that injects faults on the specific layer of the storage stack and examines detection of faults. We parallelize checksum computation and overlap it with I/O operations to mitigate the overhead caused by I/O reordering. The experimental results show that the proposed scheme reduces the entire data transfer time by up to 62% compared with the existing schemes while ensuring robust data integrity. With the prototype implementation, our scheme detects failures on NAND flash memory inside storage devices that cannot be detected with the existing schemes.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hanul Sung

OMBM: optimized memory bandwidth management for ensuring QoS and high server utilization

Understanding Parallel I/O Performance Trends Under Various HPC Configurations

OMBM-ML: efficient memory bandwidth management for ensuring QoS and improving server utilization

BBOS: Efficient HPC Storage Management via Burst Buffer Over-Subscription

Concurrent and Robust End-to-End Data Integrity Verification Scheme for Flash-Based Storage Devices

Contact Info

Product

Resources

About