The current state of practice in supercomputer resource allocation places jobs from different users on disjoint nodes both in terms of time and space. While this approach largely guarantees that jobs from different users do not degrade one another's performance, it does so at high cost to system throughput and energy efficiency. This focused study presents job striping, a technique that significantly increases performance over the current allocation mechanism by colocating pairs of jobs from different users on a shared set of nodes. To evaluate the potential of job striping in large-scale environments, the experiments are run at the scale of 128 nodes on the state-of-the-art Gordon supercomputer. Across all pairings of 1024 process network-attached storage parallel benchmarks, job striping increases mean throughput by 26% and mean energy efficiency by 22%. On pairings of the real applications Gyrokinetic Toroidal Code (GTC), Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS), and MIMD Lattice Computation (MILC) at equal scale, job striping improves average throughput by 12% and mean energy efficiency by 11%. In addition, the study provides a simple set of heuristics for avoiding low performing application pairs. 239 Figure 3. Increase in system throughput (STP) over compact when applying job spreading and striping to the NAS parallel benchmarks and GTC, LAMMPS, and MILC. Figure 3 shows the performance results for the first set of experiments with the NPBs and the second set of experiments with GTC, LAMMPS, and MILC. For the NPBs, the mean performance increase from job spreading is 50%. If one examines striped coschedules of non-identical NPBs, the average performance increase is 26%. If one selects the best running mate other than embarrassingly parallel (EP) for each benchmark, then the average increase in performance is 36%. We choose to exclude EP because EP is minimally contentious. Each EP task's working set fits entirely in the private levels of cache, and EP spends very little time in active communication. Because of these traits, EP universally causes every application that it stripes with to achieve its best striped performance. Thus, for the sake of fairness and realism, we exclude these results from the 'Best' average. For the NPBs, random striping yields about 50% of the performance benefit of job spreading, and striping each job with its best running mate provides 70% of the performance benefit of spreading. This trend continues for real applications as well. For GTC, LAMMPS, and MILC, job spreading increases throughput by 23% and the mean heterogeneous striping and the mean best striping improve performance by 12% and 16%, respectively.
PERFORMANCE RESULTS
Compact versus spreading versus striping
Network-attached storage parallel benchmarksIn this section, we examine the increase in collective throughput and energy efficiency for pairs of striped NPBs. The results are presented in Figure 4. For completeness, we run all pairwise combinations. This includes both heterogeneous pairings ...