Accounting for protein flexibility is an essential yet challenging component of structure-based virtual screening. Whereas an ideal approach would account for full protein and ligand flexibility during the virtual screening process, this is currently intractable using available computational resources. An alternative is ensemble docking, where calculations are performed on a set of individual rigid receptor conformations and the results combined. The primary challenge associated with this approach is the choice of receptor structures to use for the docking calculations. In this work, we show that selection of a small set of structures based on clustering on binding site volume overlaps provides an efficient and effective way to account for protein flexibility in virtual screening. We first apply the method to crystal structures of cyclin-dependent kinase 2 and HIV protease and show that virtual screening for ensembles of four cluster representative structures yields consistently high enrichments and diverse actives. We then apply the method to a structural ensemble of the androgen receptor generated with molecular dynamics and obtain results that are in agreement with those from the crystal structures of cyclin-dependent kinase 2 and HIV protease. This work provides a step forward in the incorporation of protein flexibility into structurebased virtual screening.Key words: binding site volume, clustering, docking, molecular dynamics Received 2 January 2012, revised 10 March 2012 and accepted for publication 9 April 2012 Virtual screening is an important part of computer-aided drug design, with many reviews (1-3) and successful applications reported (4-7) in the literature. Although ligand-based methods have been shown to yield high database enrichments in virtual screening (8-10), use of structural information, when available, provides the promise of high enrichments of hits that are not biased by knowledge of existing active compounds. Indeed, structure-based virtual screening has been applied successfully in many instances where novel compounds unrelated to existing known actives were found (11-16). However, docking tools have been most heavily validated in the context of a rigid receptor model for both pose prediction (17-22) and virtual screening (23,24), which helps reduce the conformational sampling space and makes the calculations more tractable from a computational perspective. Speed and throughput constraints are particularly important in virtual screening, considering that calculations are often performed on databases of millions of compounds and turnaround time is expected on the order of days to weeks for the computational approach to have a significant impact in the discovery process.Although successful efforts have been made to account for protein flexibility in pose prediction (25-28) and structure-based virtual screening (28-31), additional work is needed to develop a standardized protocol that can consistently add value over the standard rigid receptor approach. Rueda et al. (32) have focused on...