Object replication is a common approach to enhance the availability of distributed data-intensive services and storage systems. Many such systems are known to have highly skewed object request probability distributions. In this paper, we propose an object replication degree customization scheme that maximizes the expected service availability under given object request probabilities, object sizes, and space constraints (e.g., memory/storage capacities). In particular, we discover that the optimal replication degree of an object should be linear in the logarithm of its popularity-to-size ratio. We also study the feasibility and effectiveness of our proposed scheme using applications driven by real-life system object request traces and machine failure traces. When the data object popularity distribution is known a priori, our proposed customization can achieve 1.32-2.92 "nines" increase in system availability (or 21-74% space savings at the same availability level) compared to uniform replication. Results also suggest that our scheme requires a moderate amount of replica creation/removal overhead (weekly changes involve no more than 0.24% objects and no more than 0.11% of total data size) under realistic object request popularity changes.
Complex system software allows a variety of execution conditions on system configurations and workload properties. This paper explores a principled use of reference executions--those of similar execution conditions from the target--to help identify the symptoms and causes of performance anomalies. First, to identify anomaly symptoms, we construct change profiles that probabilistically characterize expected performance deviations between target and reference executions. By synthesizing several single-parameter change profiles, we can scalably identify anomalous reference-to-target changes in a complex system with multiple execution parameters. Second, to narrow the scope of anomaly root cause analysis, we filter anomaly-related low-level system metrics as those that manifest very differently between target and reference executions. Our anomaly identification approach requires little expert knowledge or detailed models on system internals and consequently it can be easily deployed. Using empirical case studies on the Linux I/O subsystem and a J2EE-based distributed online service, we demonstrate our approach's effectiveness in identifying performance anomalies over a wide range of execution conditions as well as multiple system software versions. In particular, we discovered five previously unknown performance anomaly causes in the Linux 2.6.23 kernel. Additionally, our preliminary results suggest that online anomaly detection and system reconfiguration may help evade performance anomalies in complex online systems.
Today's processors provide a rich source of statistical informationon application execution through hardware counters. In this paper, we explore the utilization of these statistics as request signaturesin server applications for identifying requests and inferring high-level request properties ( e.g. , CPU and I/O resource needs). Our key finding is that effective request signatures may be constructed using a small amount of hardware statistics while the request is still in an early stage of its execution. Such on-the-fly request identification and property inference allow guided operating system adaptation at request granularity ( e.g. , resource-aware request scheduling and on-the-fly request classification). We address the challenges of selecting hardware counter metrics for signature construction and providing necessary operating system support for per-request statistics management. Our implementation in the Linux 2.6.10 kernel suggests that our approach requires low overhead suitable for runtime deployment. Our on-the-fly request resource consumption inference (averaging 7%, 3%, 20%, and 41% prediction errors for four server workloads, TPC-C, TPC-H, J2EE-based RUBiS, and a trace-driven index search, respectively) is much more accurate than the online running-average based prediction (73-82% errors). Its use for resource-aware request scheduling results in a 15-70% response time reduction for three CPU-bound applications. Its use for on-the-fly request classification and anomaly detection exhibits high accuracy for the TPC-H workload with synthetically generated anomalous requests following a typical SQL-injection attack pattern.
Today's processors provide a rich source of statistical informationon application execution through hardware counters. In this paper, we explore the utilization of these statistics as request signaturesin server applications for identifying requests and inferring high-level request properties ( e.g. , CPU and I/O resource needs). Our key finding is that effective request signatures may be constructed using a small amount of hardware statistics while the request is still in an early stage of its execution. Such on-the-fly request identification and property inference allow guided operating system adaptation at request granularity ( e.g. , resource-aware request scheduling and on-the-fly request classification). We address the challenges of selecting hardware counter metrics for signature construction and providing necessary operating system support for per-request statistics management. Our implementation in the Linux 2.6.10 kernel suggests that our approach requires low overhead suitable for runtime deployment. Our on-the-fly request resource consumption inference (averaging 7%, 3%, 20%, and 41% prediction errors for four server workloads, TPC-C, TPC-H, J2EE-based RUBiS, and a trace-driven index search, respectively) is much more accurate than the online running-average based prediction (73-82% errors). Its use for resource-aware request scheduling results in a 15-70% response time reduction for three CPU-bound applications. Its use for on-the-fly request classification and anomaly detection exhibits high accuracy for the TPC-H workload with synthetically generated anomalous requests following a typical SQL-injection attack pattern.
Power capping and energy efficiency are critical concerns in server systems, particularly when serving dynamic workloads on resourcesharing multicores. We present a new operating system facility (power and energy containers) that accounts for and controls the power/energy usage of individual fine-grained server requests. This facility is enabled by novel techniques for multicore power attribution to concurrent tasks, measurement/modeling alignment to enhance predictability, and request power accounting and control.
A large number of user requests execute (often concurrently) within a server system. A single request may exhibit fluctuating hardware characteristics (such as instruction completion rate and on-chip resource usage) over the course of its execution, due to inherent variations in application execution semantics as well as dynamic resource competition on resource-sharing processors like multicores. Understanding such behavior variations can assist fine-grained request modeling and adaptive resource management.This paper presents operating system management to track request behavior variations online. In addition to metric sample collection during periodic interrupts, we exploit the frequent system calls in server applications to perform low-cost in-kernel sampling. We utilize identified behavior variations to support or enhance request modeling in request classification, anomaly analysis, and online request signature construction. A foundation of our request modeling is the ability to quantify the difference between two requests' time series behaviors. We evaluate several differencing measures and enhance the classic dynamic time warping technique with additional penalties for asynchronous warp steps. Finally, motivated by fluctuating request resource usage and the resulting contention, we implement contention-easing CPU scheduling on multicore platforms and demonstrate its effectiveness in improving the worst-case request performance.Experiments in this paper are based on five server applicationsApache web server, TPCC, TPCH, RUBiS online auction benchmark, and a user-content-driven online teaching application called WeBWorK.
Operating system laboratory assignments based on bare hardware or detailed machine simulators can be excessively challenging for many students. In the most often used approach, students develop kernels on virtual machines with a much simplified hardware interface. Traditionally this simplification goes so far as to make realistic performance measurement impossible. We propose Vesper, an instructional disk drive simulator with a high degree of performance realism. Vesper retains simplicity while providing timing statistics close to that of real disk drives. The key to our approach is to provide hardware abstractions that are simple but yet capable of capturing device interactions with major performance impacts. Vesper laboratory assignments allow students to realistically explore the performance consequences of various system designs without the cumbersome aspects of the real hardware interface. This paper describes the design and implementation of the Vesper disk drive simulator. We evaluate the effectiveness of Vesper-based laboratory assignments in terms of operating system performance evaluation. Student experience and feedback are also reported.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.