Orthrus: A Framework for Implementing Efficient Collective I/O in Multi-core Clusters

Zhang, Xuechen; Ou, Jianqiang; Davis, Kei; Jiang, Song

doi:10.1007/978-3-319-07518-1_22

Cited by 4 publications

(1 citation statement)

References 18 publications

(19 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are many variations to this basic process: The Two-Phase protocol as discussed by Thakur et.al [1] iterates over communication and I/O phases -in each phase, a maximum amount of data is accessed. Multiphase-I/O [3,4] iteratively increases locality, and Orthrus [5] offers several strategies to optimize either for file or process locality. One difficulty with these approaches is that they require careful analysis and tuning of parameters.…”

Section: Related Workmentioning

confidence: 99%

Predicting Performance of Non-contiguous I/O with Machine Learning

Kunkel

Zimmer

Betke

2015

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Data sieving in ROMIO promises to optimize individual noncontiguous I/O. However, making the right choice and parameterizing its buffer size accordingly are non-trivial tasks, since predicting the resulting performance is difficult. Since many performance factors are not taken into account by data sieving, extracting the optimal performance for a given access pattern and system is often not possible. Additionally, in Lustre, settings such as the stripe size and number of servers are tunable, yet again, identifying rules for the data-centre proves challenging indeed. In this paper, we 1) discuss limitations of data sieving, 2) apply machine learning techniques to build a performance predictor, and 3) learn and extract best practices for the settings from the data. We used decision trees as these models can capture non-linear behavior, are easy to understand and allow for extraction of the rules used. Even though this initial research is based on decision trees, with sparse training data, the algorithm can predict many cases sufficiently. Compared to a standard setting, the decision trees created are able to improve performance significantly and we can derive expert knowledge by extracting rules from the learned tree. Applying the scheme to a set of experimental data improved the average throughput by 25-50% of the best parametrization's gain. Additionally, we demonstrate the versatility of this approach by applying it to the porting system of DKRZ's next generation supercomputer and discuss achievable performance gains.

show abstract