Traditional speedup models, such as Amdahl's Law, Gustafson's, and Sun and Ni's models, have helped the research community and industry to better understand the performance capabilities of systems and the parallelizability of applications. Mostly targeting homogeneous hardware platforms or a limited form of processor heterogeneity, these models do not cover newly emerging multi-core heterogeneous architectures. This paper reports novel speedup and energy consumption models based on a more general representation of heterogeneity, called normal form heterogeneity, supporting a wide range of heterogeneous many-core architectures. The modelling method aims to predict system energy efficiency and performance ranges and facilitates research and development for the hardware and system software levels. Extensive experimentation on an off-the-shelf big.LITTLE heterogeneous platform validates the models showing less than 1% error for speedup and less than 4% error for power dissipation. The practical use of the method is demonstrated with a quantitative study of system load balancing efficiency.
Traditional speedup models, such as Amdahls, facilitate the study of the impact of running parallel workloads on manycore systems. However, these models are typically based on software characteristics, assuming ideal hardware behaviors. As such, the applicability of these models for energy and/or performance-driven system optimization is limited by two factors. Firstly, speedup cannot be measured without instrumenting the original software codes, and secondly, the parallelization factor of an application running on specific hardware is generally unknown.In this paper, we propose a novel method, whereby standard performance counters found in modern many-core platforms can be used to derive speedup without instrumenting applications for time measurements. We postulate that speedup can be accurately estimated as a ratio of instructions per cycle for a parallel manycore system to the instructions per cycle of a single core system. By studying the application instructions and system instructions for the first time, our method leads to the determination of the parallelization factor and the optimal system configuration for energy and/or performance. The method is extensively demonstrated through experiments on three different platforms with core numbers ranging from 4 to 61, running parallel benchmark applications (including synthetic and PARSEC benchmarks) on Linux operating system. Speedup and parallelization estimations using our method and their extensive cross-validations show negligible errors (up to 8%) in these systems. Additionally, we demonstrate the effectiveness of our method to explore parallelization-aware energy-efficient system configurations for many-core systems using energy-delay-product based formulations.Index Terms-Many-core processors; speedup; performance counter, power normalized performance, energy-delay-product.• Extend Amdahl's speedup model considering applications and system software related overhead separately.• Propose a new method to model parallelization and speedup via performance counters to avoid the need for instrumenting applications. We show that speedup can be accurately estimated as a ratio of instructions retired/executed per cycle of
For over 50 years, Amdahl's Law has been the hallmark model for reasoning about performance bounds for homogeneous parallel computing resources. As heterogeneous, many-core parallel resources continue to permeate into the modern server and embedded domains, there has been growing interest in promulgating realistic extensions and assumptions in keeping with newer use cases. This study aims to provide a comprehensive review of the purviews and insights provided by the extensive body of work related to Amdahl's law to date, focusing on computation speedup. The authors show that a significant portion of these studies has looked into analysing the scalability of the model considering both workload and system heterogeneity in real-world applications. The focus has been to improve the definition and semantic power of the two key parameters in the original model: the parallel fraction (f) and the computation capability improvement index (n). More recently, researchers have shown normal-form and multi-fraction extensions that can account for wider ranges of heterogeneity, validated on many-core systems running realistic workloads. Speedup models from Amdahl's law onwards have seen a wide range of uses, such as the optimisation of system execution, and these uses are even more important with the advent of the heterogeneous many-core era.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.