Workload prediction has been widely researched in the literature. However, existing techniques are per-job based and useful for service-like tasks whose workloads exhibit seasonality and trend. But cloud jobs have many different workload patterns and some do not exhibit recurring workload patterns. We consider job-pool-based workload estimation, which analyzes the characteristics of existing tasks' workloads to estimate the currently running tasks' workload. First cluster existing tasks based on their workloads. For a new task J, collect the initial workload of J and determine which cluster J may belong to, then use the cluster's characteristics to estimate J ′ s workload. Based on the Google dataset, the algorithm is experimentally evaluated and its effectiveness is confirmed. However, the workload patterns of some tasks do have seasonality and trend, and conventional per-job-based regression methods may yield better workload prediction results. Also, in some cases, some new tasks may not follow the workload patterns of existing tasks in the pool. Thus, develop an integrated scheme which combines clustering and regression and utilize the best of them for workload prediction. Experimental study shows that the combined approach can further improve the accuracy of workload prediction. K E Y W O R D S cloud computing, dynamic time warp distance, workload clustering, workload estimation 1 INTRODUCTION Cloud computing is very popular covering e-commerce, education, government, and other fields. Studies have shown significant benefits offered by Infrastructure as a Service (IaaS) cloud, including providing a greener computing environment. 1 Many systems are developed to take advantage of the cloud, including big data analytics, offloading from mobile devices 2 and cloud storage systems. 3-5 However, without efficient resource management by the IaaS provider, the potential value of cloud computing cannot be fully realized. One of the important tasks in an IaaS cloud provider is to schedule resource precisely to minimize the cost of the deployment and operation of the cloud platform while fully guarantee the SLA (service level agreement) for each customer. For example, reduction of operating servers by migrating tasks can save power and provide greener computation. In order to manage cloud resources effectively, the future workloads in the cloud should be well predicted. Many workload prediction algorithms are researched in the literature. 6-8 But they use per-job-based methods (which predict a single job's future workload only based on its historical workload) and consider service-like workloads. Generally, services are accessed by users and they run year-round to process user requests. The workload characteristics of a certain systems depend heavily on its user access patterns, and mostly present periodicity and trend. Prediction can be done for these workloads using well-established statistical techniques, such as autocorrelation and regression.