Cloud computing brings convenience and cost efficiency to users, but multiplexing virtual machines (VMs) on a single physical machine (PM) results in various cybersecurity risks. For example, a co-resident attack could occur when malicious VMs use shared resources on the hosting PM to control or gain unauthorized access to other benign VMs. Most task schedulers do not contribute to both resource management and risk control. This article studies how to minimize the co-resident risks while optimizing the VM completion time through designing efficient VM allocation policies. A zero-trust threat model is defined with a set of co-resident risk mitigation parameters to support this argument and assume that all VMs are malicious. In order to reduce the chances of co-residency, deep reinforcement learning (DRL) is adopted to decide the VM allocation strategy. An effective cost function is developed to guide the reinforcement learning (RL) policy training. Compared with other traditional scheduling paradigms, the proposed system achieves plausible mitigation of co-resident attacks with a relatively small VM slowdown ratio.