Cloud computing has innovated the IT industry in recent years, as it can delivery subscription-based services to users in the pay-as-you-go model. Meanwhile, multimedia cloud computing is emerging based on cloud computing to provide a variety of media services on the Internet. However, with the growing popularity of multimedia cloud computing, its large energy consumption cannot only contribute to greenhouse gas emissions, but also result in the rising of cloud users’ costs. Therefore, the multimedia cloud providers should try to minimize its energy consumption as much as possible while satisfying the consumers’ resource requirements and guaranteeing quality of service (QoS). In this paper, we have proposed a remaining utilization-aware (RUA) algorithm for virtual machine (VM) placement, and a power-aware algorithm (PA) is proposed to find proper hosts to shut down for energy saving. These two algorithms have been combined and applied to cloud data centers for completing the process of VM consolidation. Simulation results have shown that there exists a trade-off between the cloud data center’s energy consumption and service-level agreement (SLA) violations. Besides, the RUA algorithm is able to deal with variable workload to prevent hosts from overloading after VM placement and to reduce the SLA violations dramatically.
In order to reduce power and energy costs, giant cloud providers now mix online and batch jobs on the same cluster. Although the co-allocation of such jobs improves machine utilization, it challenges the data center scheduler and workload assignment in terms of quality of service, fault tolerance, and failure recovery, especially for latency critical online services. In this paper, we explore various characteristics of co-allocated online services and batch jobs from a production cluster containing 1.3k servers in Alibaba Cloud. From the trace data, we find the following: 1) For batch jobs with multiple tasks and instances, 50.8% failed tasks wait and halted after a very long time interval when their first and the only one instance fails. This wastes much time and resources as the remaining instances are running for an impossible successful termination. 2) For online services jobs, they are clustered in 25 categories according to their requested CPU, memory, and disk resources. Such clustering can help the co-allocation of online services jobs with batch jobs. 3) Servers are clustered into seven groups by CPU utilization, memory utilization, and their correlations. Machines with a strong correlation between CPU and memory utilization provides an opportunity for job co-allocation and resource utilization estimation. 4) The MTBF (mean time between failures) of instances are in the interval [400, 800] seconds while the average completion time of the 99th percentile is 1003 seconds. We also compare the cumulative distribution functions of jobs and servers and explain the differences and opportunities for workload assignment between them. Our findings and insights presented in this paper can help the community and data center operators better understand the workload characteristics, improve resource utilization, and failure recovery design. INDEX TERMS Co-allocated jobs, workload characterization, online services, batch jobs, data center, scheduling.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.