The area of cluster-level energy management has attracted significant research attention over the past few years. One class of techniques to reduce the energy consumption of clusters is to selectively power down nodes during periods of low utilization to increase energy efficiency. One can think of a number of ways of selectively powering down nodes, each with varying impact on the workload response time and overall energy consumption. Since the MapReduce framework is becoming "ubiquitous", the focus of this paper is on developing a framework for systematically considering various MapReduce node power down strategies, and their impact on the overall energy consumption and workload response time.We closely examine two extreme techniques that can be accommodated in this framework. The first is based on a recently proposed technique called "Covering Set" (CS) that keeps only a small fraction of the nodes powered up during periods of low utilization. At the other extreme is a technique that we propose in this paper, called the All-In Strategy (AIS). AIS uses all the nodes in the cluster to run a workload and then powers down the entire cluster. Using both actual evaluation and analytical modeling we bring out the differences between these two extreme techniques and show that AIS is often the right energy saving strategy.
Abstract-Energy consumption is a crucial and rising operational cost for data-intensive computing. In this paper we investigate some opportunities and challenges that arise in energy-aware computing in a cluster of servers running data-intensive workloads. A key insight is that in most data centers, servers are underutilized, which makes it attractive to consider powering down some servers and redistributing their load to others. Of course, powering down servers naively will render data stored only on powered down servers inaccessible. While data replication can be exploited to power down servers without losing access to data, unfortunately, care must be taken in the design of the replication and server power down schemes to avoid creating load imbalances on the remaining "live" servers. Accordingly, in this paper we study the interaction between energy management, load balancing, and replication strategies for dataintensive cluster computing. In particular, we show that Chained Declustering -a replication strategy proposed more than 20 years ago -can support very flexible energy management schemes.
As traditional and mission-critical relational database workloads migrate to the cloud in the form of Databaseas-a-Service (DaaS), there is an increasing motivation to provide performance goals in Service Level Objectives (SLOs). Providing such performance goals is challenging for DaaS providers as they must balance the performance that they can deliver to tenants and the data center's operating costs. In general, aggressively aggregating tenants on each server reduces the operating costs but degrades performance for the tenants, and vice versa. In this paper, we present a framework that takes as input the tenant workloads, their performance SLOs, and the server hardware that is available to the DaaS provider, and outputs a costeffective recipe that specifies how much hardware to provision and how to schedule the tenants on each hardware resource. We evaluate our method and show that it produces effective solutions that can reduce the costs for the DaaS provider while meeting performance goals.
Energy is a growing component of the operational cost for many "big data" deployments, and hence has become increasingly important for practitioners of large-scale data analysis who require scale-out clusters or parallel DBMS appliances. Although a number of recent studies have investigated the energy efficiency of DBMSs, none of these studies have looked at the architectural design space of energy-efficient parallel DBMS clusters. There are many challenges to increasing the energy efficiency of a DBMS cluster, including dealing with the inherent scaling inefficiency of parallel data processing, and choosing the appropriate energy-efficient hardware. In this paper, we experimentally examine and analyze a number of key parameters related to these challenges for designing energy-efficient database clusters. We explore the cluster design space using empirical results and propose a model that considers the key bottlenecks to energy efficiency in a parallel DBMS. This paper represents a key first step in designing energy-efficient database clusters, which is increasingly important given the trend toward parallel database appliances.
As the size and complexity of analytic data processing systems continue to grow, the effort required to mitigate faults and performance skew has also risen. However, in some environments we have encountered, users prefer to continue query execution even in the presence of failures (e.g., the unavailability of certain data sources), and receive a "partial" answer to their query. We explore ways to characterize and classify these partial results, and describe an analytical framework that allows the system to perform coarse to fine-grained analysis to determine the semantics of a partial result. We propose that if the system is equipped with such a framework, in some cases it is better to return and explain partial results than to attempt to avoid them.
Abstract. The current computing trend towards cloud-based Databaseas-a-Service (DaaS) as an alternative to traditional on-site relational database management systems (RDBMSs) has largely been driven by the perceived simplicity and cost-effectiveness of migrating to a DaaS. However, customers that are attracted to these DaaS alternatives may find that the range of different services and pricing options available to them add an unexpected level of complexity to their decision making. Cloud service pricing models are typically 'pay-as-you-go' in which the customer is charged based on resource usage such as CPU and memory utilization. Thus, customers considering different DaaS options must take into account how the performance and efficiency of the DaaS will ultimately impact their monthly bill. In this paper, we show that the current DaaS model can produce unpleasant surprises -for example, the case study that we present in this paper illustrates a scenario in which a DaaS service powered by a DBMS that has a lower hourly rate actually costs more to the end user than a DaaS service that is powered by another DBMS that charges a higher hourly rate. Thus, what we need is a method for the end-user to get an accurate estimate of the true costs that will be incurred without worrying about the nuances of how the DaaS operates. One potential solution to this problem is for DaaS providers to offer a new service called Benchmark as a Service (BaaS) where in the user provides the parameters of their workload and SLA requirements, and get a price quote.
Abstract-Long time-series datasets are common in many domains, especially scientific domains. Applications in these fields often require comparing trajectories using similarity measures. Existing methods perform well for short time-series but their evaluation cost degrades rapidly for longer time-series. In this work, we develop a new time-series similarity measure called the Dictionary Compression Score (DCS) for determining time-series similarity. We also show that this method allows us to accurately and quickly calculate similarity for both short and long time-series. We use the well known Kolmogorov Complexity in information theory and the Lempel-Ziv compression framework as a basis to calculate similarity scores. We show that off-the-shelf compressors do not fair well for computing time-series similarity. To address this problem, we developed a novel dictionary-based compression technique to compute time-series similarity. We also develop heuristics to automatically identify suitable parameters for our method, thus removing the task of parameter tuning found in other existing methods. We have extensively compared DCS with existing similarity methods for classification. Our experimental evaluation shows that for long time-series datasets, DCS is accurate, and it is also significantly faster than existing methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.