Modeling I/O performance variability in high-performance computing systems using mixture distributions

Xu, Li; Wang, Yueyao; Lux, Thomas; Chang, Tyler H.; Bernard, Jon; Li, Bo; Hong, Yili; Cameron, Kirk W.; Watson, Layne T.

doi:10.1016/j.jpdc.2020.01.005

Cited by 11 publications

(6 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We focus on the prediction of the standard deviation of the throughput in this paper. Xu et al (2020) show that the distribution of the throughput is multi-modal and thus it is complicated. A more ambitious goal is to predict the system throughput distribution generally (e.g., Lux et al 2018).…”

Section: System Optimization Resultsmentioning

confidence: 99%

Prediction of High-Performance Computing Input/Output Variability and Its Application to Optimization for System Configurations

Lux

Chang

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Performance variability is an important measure for a reliable high performance computing (HPC) system. Performance variability is affected by complicated interactions between numerous factors, such as CPU frequency, the number of input/output (IO) threads, and the IO scheduler. In this paper, we focus on HPC IO variability. The prediction of HPC variability is a challenging problem in the engineering of HPC systems and there is little statistical work on this problem to date. Although there are many methods available in the computer experiment literature, the applicability of existing methods to HPC performance variability needs investigation, especially, when the objective is to predict performance variability both in interpolation and extrapolation settings. A data analytic framework is developed to model data collected from largescale experiments. Various promising methods are used to build predictive models for the variability of HPC systems. We evaluate the performance of the methods by measuring prediction accuracy at previously unseen system configurations. We also discuss a methodology for optimizing system configurations that uses the estimated variability map. The findings from method comparisons and developed tool sets in this paper yield new insights into existing statistical methods and can be beneficial for the practice of HPC variability management. This paper has supplementary materials online.

show abstract

Section: System Optimization Resultsmentioning

confidence: 99%

Prediction of High-Performance Computing Input/Output Variability and Its Application to Optimization for System Configurations

Lux

Chang

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…In particular, the KS distance can be misleading when a CDF F (x) has a steep behavior (i.e., throughputs have multiple modes), and Xu et al (2020) show that multimodal behaviors commonly exist through the IOzone data. The KS distance measures the maximal error while EL 1 provides an average discrepancy.…”

Section: Conclusion and Areas For Future Researchmentioning

confidence: 99%

“…For example, Cameron et al (2019) study the standard deviation of the IOzone throughput. Xu et al (2020) show that the throughput distribution is multimodal so a summary statistic like standard deviation cannot represent the system variability. As an illustration, Figure 1(b) shows the histograms of the I/O throughput under four specific HPC system configurations.…”

Section: Introductionmentioning

confidence: 99%

Prediction for Distributional Outcomes in High-Performance Computing I/O Variability

Li¹,

Hong²,

Morris³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Although high-performance computing (HPC) systems have been scaled to meet the exponentially-growing demand for scientific computing, HPC performance variability remains a major challenge and has become a critical research topic in computer science.Statistically, performance variability can be characterized by a distribution. Predicting performance variability is a critical step in HPC performance variability management and is nontrivial because one needs to predict a distribution function based on system factors. In this paper, we propose a new framework to predict performance distributions.The proposed model is a modified Gaussian process that can predict the distribution function of the input/output (I/O) throughput under a specific HPC system configuration. We also impose a monotonic constraint so that the predicted function is nondecreasing, which is a property of the cumulative distribution function. Additionally, the proposed model can incorporate both quantitative and qualitative input variables.We evaluate the performance of the proposed method by using the IOzone variability data based on various prediction tasks. Results show that the proposed method can generate accurate predictions, and outperform existing methods. We also show how the predicted functional output can be used to generate predictions for a scalar summary of the performance distribution, such as the mean, standard deviation, and quantiles. Our methods can be further used as a surrogate model for HPC system variability monitoring and optimization.

show abstract

“…In the data collection stage, researchers identify HPC system settings for which I/O throughput data should be collected. Computer scientists often use grid-based designs (GBDs) to collect data under numerous possible system configurations, when the number of factors is relatively small (Cameron et al 2019, Xu et al 2020. Note that the GBDs are equivalent to full factorial designs.…”

mentioning

confidence: 99%

“…Another commonly-used statistical approximation model is Gaussian process (GP) regression (e.g., Sacks et al 1989, Currin et al 1991, which can generate a smooth surface and be capable of dealing with the heteroscedasticity (Goldberg, Williams, and Bishop 1998) in the response variable. In the HPC community, mixture models have been used to study the multimodal behavior of the throughput distribution (Xu et al 2020). Some novel numerical techniques, including max box mesh, iterative box mesh, and Voronoi mesh methods for interpolation, are investigated by .…”

mentioning

confidence: 99%

Design Strategies and Approximation Methods for High-Performance Computing Variability Management

Wang¹,

Li²,

Hong³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Problem: Performance variability management is an active research area in highperformance computing (HPC). In this paper, we focus on input/output (I/O) variability, which is a complicated function that is affected by many system factors. To study the performance variability, computer scientists often use grid-based designs (GBDs) which are equivalent to full factorial designs to collect I/O variability data, and use mathematical approximation methods to build a prediction model. Mathematical approximation models, as deterministic methods, could be biased particularly if extrapolations are needed. In statistics literature, space-filling designs (SFDs) and surrogate models such as Gaussian process (GP) are popular for data collection and building predictive models.The applicability of SFDs and surrogates in the HPC variability management setting, however, needs investigation. In this case study, we investigate their applicability in the HPC setting in terms of design efficiency, prediction accuracy, and scalability.Approach: We first customize the existing SFDs so that they can be applied in the HPC setting. We conduct a comprehensive investigation of design strategies and the prediction ability of approximation methods. We use both synthetic data simulated from three test functions and the real data from the HPC setting. We then compare different methods in terms of design efficiency, prediction accuracy, and scalability.Results: In our synthetic and real data analysis, GP with SFDs outperforms in most scenarios. With respect to the choice of approximation models, GP is recommended if

show abstract

Modeling I/O performance variability in high-performance computing systems using mixture distributions

Cited by 11 publications

References 14 publications

Prediction of High-Performance Computing Input/Output Variability and Its Application to Optimization for System Configurations

Prediction of High-Performance Computing Input/Output Variability and Its Application to Optimization for System Configurations

Prediction for Distributional Outcomes in High-Performance Computing I/O Variability

Design Strategies and Approximation Methods for High-Performance Computing Variability Management

Contact Info

Product

Resources

About