Abstract-With increasingly mature virtual machine (VM) technology, the compute resources provided by Cloud systems can be divided or isolated on demand under a payment model. By leveraging such a feature, we design and implement a cloud system that can optimize the overall performance of processing user requests which are made up of composite services. Specifically, we aim to minimize the response time for each user request, and also maximize the fairness of the treatment for the competitive situation in short supply. We first design an optimal VM resource allocation scheme with a minimized VMM operation cost for each task. Then, for maximizing the fairness of the treatment in the competitive situation, we design a best-suited queuing policy and a resource sharing scheme adjusted based on Proportional-Share model, which can effectively disperse the resource contention. Experiments confirm two points: (1) the mean task response time is close to the theoretically optimal value in the non-competitive situation; (2) when the system runs in short supply, each request could still be processed efficiently, with just a slight extension on their response times compared to their ideal values. The solution that combines Lightest Workload First (LWF) queuing policy with our designed Adjusted Proportional-Share Model (LWF+APSM) exhibits the best and stable performance. It outperforms other solutions in the competitive situation, by 38% w.r.t. the worst-case response time and by 12% w.r.t. the fairness of the treatment.