Standardized benchmarks have become widely accepted tools for the comparison of products and evaluation of methodologies. These benchmarks are created by consortia like SPEC and TPC under confidentiality agreements which provide little opportunity for outside observers to get a look at the processes and concerns that are prevalent in benchmark development. This paper introduces the primary concerns of benchmark development from the perspectives of SPEC and TPC committees. We provide a benchmark definition, outline the types of benchmarks, and explain the characteristics of a good benchmark. We focus on the characteristics important for a standardized benchmark, as created by the SPEC and TPC consortia. To this end, we specify the primary criteria to be employed for benchmark design and workload selection. We use multiple standardized benchmarks as examples to demonstrate how these criteria are ensured.
No abstract
Today’s system developers and operators face the challenge of creating software systems that make efficient use of dynamically allocated resources under highly variable and dynamic load profiles, while at the same time delivering reliable performance. Autonomic controllers, for example, an advanced autoscaling mechanism in a cloud computing context, can benefit from an abstracted load model as knowledge to reconfigure on time and precisely. Existing workload characterization approaches have limited support to capture variations in the interarrival times of incoming work units over time (i.e., a variable load profile). For example, industrial and scientific benchmarks support constant or stepwise increasing load, or interarrival times defined by statistical distributions or recorded traces. These options show shortcomings either in representative character of load variation patterns or in abstraction and flexibility of their format. In this article, we present the Descartes Load Intensity Model (DLIM) approach addressing these issues. DLIM provides a modeling formalism for describing load intensity variations over time. A DLIM instance is a compact formal description of a load intensity trace. DLIM-based tools provide features for benchmarking, performance, and recorded load intensity trace analysis. As manually obtaining and maintaining DLIM instances becomes time consuming, we contribute three automated extraction methods and devised metrics for comparison and method selection. We discuss how these features are used to enhance system management approaches for adaptations during runtime, and how they are integrated into simulation contexts and enable benchmarking of elastic or adaptive behavior. We show that automatically extracted DLIM instances exhibit an average modeling error of 15.2% over 10 different real-world traces that cover between 2 weeks and 7 months. These results underline DLIM model expressiveness. In terms of accuracy and processing speed, our proposed extraction methods for the descriptive models are comparable to existing time series decomposition methods. Additionally, we illustrate DLIM applicability by outlining approaches of workload modeling in systems engineering that employ or rely on our proposed load intensity modeling formalism.
No abstract
Energy efficiency of servers has become a significant research topic over the last years, as server energy consumption varies depending on multiple factors, such as server utilization and workload type. Server energy analysis and estimation must take all relevant factors into account to ensure reliable estimates and conclusions. Thorough system analysis requires benchmarks capable of testing different system resources at different load levels using multiple workload types. Server energy estimation approaches, on the other hand, require knowledge about the interactions of these factors for the creation of accurate power models. Common approaches to energy-aware workload classification categorize workloads depending on the resource types used by the different workloads. However, they rarely take into account differences in workloads targeting the same resources. Industrial energyefficiency benchmarks typically do not evaluate the system's energy consumption at different resource load levels, and they only provide data for system analysis at maximum system load.In this paper, we benchmark multiple server configurations using the CPU worklets included in SPEC's Server Efficiency Rating Tool (SERT). We evaluate the impact of load levels and different CPU workloads on power consumption and energy efficiency. We analyze how functions approximating the measured power consumption differ over multiple server configurations and architectures.We show that workloads targeting the same resource can differ significantly in their power draw and energy efficiency. The power consumption of a given workload type varies depending on utilization, hardware and software configuration. The power consumption of CPU-intensive workloads does not scale uniformly with increased load, nor do hardware or software configuration changes affect it in a uniform manner.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.