We report on the development and validation of a new version of DOCK. The algorithm has been rewritten in a modular format, which allows for easy implementation of new scoring functions, sampling methods and analysis tools. We validated the sampling algorithm with a test set of 114 protein-ligand complexes. Using an optimized parameter set, we are able to reproduce the crystal ligand pose to within 2 A of the crystal structure for 79% of the test cases using our rigid ligand docking algorithm with an average run time of 1 min per complex and for 72% of the test cases using our flexible ligand docking algorithm with an average run time of 5 min per complex. Finally, we perform an analysis of the docking failures in the test set and determine that the sampling algorithm is generally sufficient for the binding pose prediction problem for up to 7 rotatable bonds; i.e. 99% of the rigid ligand docking cases and 95% of the flexible ligand docking cases are sampled successfully. We point out that success rates could be improved through more advanced modeling of the receptor prior to docking and through improvement of the force field parameters, particularly for structures containing metal-based cofactors.
The study of mechanistically diverse enzyme superfamilies-collections of enzymes that perform different overall reactions but share both a common fold and a distinct mechanistic step performed by key conserved residues-helps elucidate the structure-function relationships of enzymes. We have developed a resource, the structure-function linkage database (SFLD), to analyze these structure-function relationships. Unique to the SFLD is its hierarchical classification scheme based on linking the specific partial reactions (or other chemical capabilities) that are conserved at the superfamily, subgroup, and family levels with the conserved structural elements that mediate them. We present the results of analyses using the SFLD in correcting misannotations, guiding protein engineering experiments, and elucidating the function of recently solved enzyme structures from the structural genomics initiative. The SFLD is freely accessible at http://sfld.rbvi.ucsf.edu.
As experimental technologies for characterization of proteomes emerge, bioinformatic analysis of the data becomes essential. Separation and identification technologies currently based on two-dimensional gels/mass spectrometry provide the inherent analytical power required. This strategy involves protein spot digestion and accurate mass mapping together with computational interrogation of available data bases for protein functional identification. When either no exact match is found or when the possible matches only partially account for molecular weights actually observed, peptide sequencing by tandem mass spectrometry has emerged as the methodology of choice to provide the basic additional information required. To evaluate the capabilities of bioinformatics methods employed for identifying homologs of a protein of interest, we attempted to identify the major proteins from the 20 S proteasome of Trypanosoma brucei using sequence information determined using mass spectrometry. The results suggest that neither the traditional query engines, BLAST and FASTA, nor specialized software developed for analysis of sequence information obtained by mass spectrometry are able to identify even closely related sequences at statistically significant scores. To address this deficit, new bioinformatics approaches were developed for concomitant use of the multiple fragments of short sequence typically available from methods of tandem mass spectrometry. These approaches rely on the occurrence of congruence across searches of multiple fragments from a single protein. This method resulted in sharply better statistical significance values for correct hits in the data base output relative to that achieved for independent searches using single sequence fragments.Fueled by the genome projects, encyclopedic increases in the banking of newly obtained, comprehensive biological data are transforming studies of biology and medicine (1). As the postgenomic era moves into high gear, new "high throughput" technologies are allowing characterization of gene expression profiles, comparisons of genomic complements, and identification of the genetic markers associated with normal, pathological, or environmentally triggered states. Yet information derived from full analysis of genomics alone is clearly inadequate to explain the complexities of cell biology. Recent studies showing differences between the genome and the proteome suggest that the profound understanding we seek will require the complete and direct characterization of the proteome as well (2, 3).Peptide mass mapping by MALDI-TOF 1 MS (4) or liquid chromatography-electrospray ionization MS (5, 6), combined with interrogation of sequence data bases (7-12), currently is the most widely employed strategy for the identification of expressed proteins. This methodology involves electrophoretic separation of proteins at sub-picomole levels, digestion with trypsin, and measurement of the molecular weights of the resulting peptide mixture by mass spectrometry. This strategy can routinely identify p...
Genetic algorithms have properties which make them attractive in de novo drug design. Like other de novo design programs, genetic algorithms require a method to reduce the enormous search space of possible compounds. Most often this is done using information from known ligands. We have developed the ADAPT program, a genetic algorithm which uses molecular interactions evaluated with docking calculations as a fitness function to reduce the search space. ADAPT does not require information about known ligands. The program takes an initial set of compounds and iteratively builds new compounds based on the fitness scores of the previous set of compounds. We describe the particulars of the ADAPT algorithm and its application to three well-studied target systems. We also show that the strategies of enhanced local sampling and re-introducing diversity to the compound population during the design cycle provide better results than conventional genetic algorithm protocols.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.