The combination of advances in structure-based drug design efforts in the pharmaceutical industry in parallel with structural genomics initiatives in the public domain has led to an explosion in the number of structures of protein-small molecule complexes structures. This information has critical importance to both the understanding of the structural basis for molecular recognition in biological systems and the design of better drugs. A significant challenge exists in managing this vast amount of data and fully leveraging it. Here, we review our work to develop a simple, fast way to store, organize, mine, and analyze large numbers of protein-small molecule complexes. We illustrate the utility of the approach to the management of inhibitor complexes from the protein kinase family. Finally, we describe our recent efforts in applying this method to the design of target-focused chemical libraries.
Received 25 October 2005The past decade has witnessed a dramatic increase in the number of protein-small molecule complex structures from experimental as well as in silico approaches. Over 5000 small molecule complexes have been deposited in public databases [1], and it is likely that a much greater number have been determined within the pharmaceutical industry. With significant progress in structural genomics initiatives [2] and advances of high-throughput crystallography [3] and highthroughput NMR technology [4], the total number of structures will grow at an even greater speed. In parallel to the growth of experimentally determined structures, a wealth of in silico structural information is also being generated from virtual screening efforts. The ability to fully leverage this experimental and in silico information hinges on the ability to organize, analyze, and mine the structural data to both derive insights into molecular recognition in biological systems as well as facilitate the design of novel therapeutics.Several approaches exist for analyzing the interactions of proteinsmall molecule complexes. For example, LIGPLOT generates twodimensional (2D) schematics of protein-small molecule complexes that describe the intermolecular interactions including the strengths of hydrogen bonds and hydrophobic interactions [5]. In addition to 2D approaches, a variety of sophisticated computer graphics programs such as InsightII (Accelrys Inc., Burlington, MA, USA), VIDA (Open Eye, Cambridge, MA, USA), Sybyl (Tripos Inc., St Louis, MO, USA), and Maestro (Schrodinger Inc., New York, NY, USA) are available for analyzing the 3D structure of proteins and their complexes [6]. Although these 2D and 3D approaches are useful when examining small numbers of structures, the analysis of large datasets is not feasible. In addition to visual analysis of complexes, it is also common to use energy-based methods to evaluate how favorable the binding interactions are between a protein and a small molecule complex. These scoring methods are commonly used to rank and filter large datasets of virtual screening results [7]. Recent studies have demonst...