Virtual machine (VM) technology provides an additional layer of abstraction for resource management in highperformance computing (HPC) systems. In large-scale computing clusters, component failures become norms instead of exceptions, caused by the ever-increasing system complexity. VM construction and reconfiguration is a potent tool for efficient online system maintenance and failure resilience. In this paper, we study how VM-based HPC clusters benefits from failure prediction in resource management for dependable computing. We consider both the reliability and performance status of compute nodes in making selection decisions. We define a capacity-reliability metric to combine the effects of both factors, and propose the Best-fit algorithm to find the best qualified nodes on which to instantiate VMs to run user jobs. We have conducted experiments using failure traces from the Los Alamos National Laboratory (LANL) HPC clusters. The results show the enhancement of system dependability by using our proposed strategy with practically achievable accuracy of failure prediction. With the Best-fit strategies, the job completion rate is increased by 10.5% compared with that achieved in the current LANL HPC cluster. The task completion rate reaches 82.5% with improved utilization of relatively unreliable nodes.