Scott C.‐H. Pegg scite author profile

We report on the development and validation of a new version of DOCK. The algorithm has been rewritten in a modular format, which allows for easy implementation of new scoring functions, sampling methods and analysis tools. We validated the sampling algorithm with a test set of 114 protein-ligand complexes. Using an optimized parameter set, we are able to reproduce the crystal ligand pose to within 2 A of the crystal structure for 79% of the test cases using our rigid ligand docking algorithm with an average run time of 1 min per complex and for 72% of the test cases using our flexible ligand docking algorithm with an average run time of 5 min per complex. Finally, we perform an analysis of the docking failures in the test set and determine that the sampling algorithm is generally sufficient for the binding pose prediction problem for up to 7 rotatable bonds; i.e. 99% of the rigid ligand docking cases and 95% of the flexible ligand docking cases are sampled successfully. We point out that success rates could be improved through more advanced modeling of the receptor prior to docking and through improvement of the force field parameters, particularly for structures containing metal-based cofactors.

show abstract

Leveraging Enzyme Structure−Function Relationships for Functional Inference and Experimental Design: The Structure−Function Linkage Database

Pegg

et al. 2006

View full text Add to dashboard Cite

The study of mechanistically diverse enzyme superfamilies-collections of enzymes that perform different overall reactions but share both a common fold and a distinct mechanistic step performed by key conserved residues-helps elucidate the structure-function relationships of enzymes. We have developed a resource, the structure-function linkage database (SFLD), to analyze these structure-function relationships. Unique to the SFLD is its hierarchical classification scheme based on linking the specific partial reactions (or other chemical capabilities) that are conserved at the superfamily, subgroup, and family levels with the conserved structural elements that mediate them. We present the results of analyses using the SFLD in correcting misannotations, guiding protein engineering experiments, and elucidating the function of recently solved enzyme structures from the structural genomics initiative. The SFLD is freely accessible at http://sfld.rbvi.ucsf.edu.

show abstract

Functional Assignment of the 20 S Proteasome from Trypanosoma brucei Using Mass Spectrometry and New Bioinformatics Approaches

Huang¹,

Jacob²,

Pegg³

et al. 2001

Journal of Biological Chemistry

View full text Add to dashboard Cite

As experimental technologies for characterization of proteomes emerge, bioinformatic analysis of the data becomes essential. Separation and identification technologies currently based on two-dimensional gels/mass spectrometry provide the inherent analytical power required. This strategy involves protein spot digestion and accurate mass mapping together with computational interrogation of available data bases for protein functional identification. When either no exact match is found or when the possible matches only partially account for molecular weights actually observed, peptide sequencing by tandem mass spectrometry has emerged as the methodology of choice to provide the basic additional information required. To evaluate the capabilities of bioinformatics methods employed for identifying homologs of a protein of interest, we attempted to identify the major proteins from the 20 S proteasome of Trypanosoma brucei using sequence information determined using mass spectrometry. The results suggest that neither the traditional query engines, BLAST and FASTA, nor specialized software developed for analysis of sequence information obtained by mass spectrometry are able to identify even closely related sequences at statistically significant scores. To address this deficit, new bioinformatics approaches were developed for concomitant use of the multiple fragments of short sequence typically available from methods of tandem mass spectrometry. These approaches rely on the occurrence of congruence across searches of multiple fragments from a single protein. This method resulted in sharply better statistical significance values for correct hits in the data base output relative to that achieved for independent searches using single sequence fragments.Fueled by the genome projects, encyclopedic increases in the banking of newly obtained, comprehensive biological data are transforming studies of biology and medicine (1). As the postgenomic era moves into high gear, new "high throughput" technologies are allowing characterization of gene expression profiles, comparisons of genomic complements, and identification of the genetic markers associated with normal, pathological, or environmentally triggered states. Yet information derived from full analysis of genomics alone is clearly inadequate to explain the complexities of cell biology. Recent studies showing differences between the genome and the proteome suggest that the profound understanding we seek will require the complete and direct characterization of the proteome as well (2, 3).Peptide mass mapping by MALDI-TOF 1 MS (4) or liquid chromatography-electrospray ionization MS (5, 6), combined with interrogation of sequence data bases (7-12), currently is the most widely employed strategy for the identification of expressed proteins. This methodology involves electrophoretic separation of proteins at sub-picomole levels, digestion with trypsin, and measurement of the molecular weights of the resulting peptide mixture by mass spectrometry. This strategy can routinely identify p...

show abstract

Untitled

2001

View full text Add to dashboard Cite

Genetic algorithms have properties which make them attractive in de novo drug design. Like other de novo design programs, genetic algorithms require a method to reduce the enormous search space of possible compounds. Most often this is done using information from known ligands. We have developed the ADAPT program, a genetic algorithm which uses molecular interactions evaluated with docking calculations as a fitness function to reduce the search space. ADAPT does not require information about known ligands. The program takes an initial set of compounds and iteratively builds new compounds based on the fitness scores of the previous set of compounds. We describe the particulars of the ADAPT algorithm and its application to three well-studied target systems. We also show that the strategies of enhanced local sampling and re-introducing diversity to the compound population during the design cycle provide better results than conventional genetic algorithm protocols.

show abstract

Shotgun: getting more from sequence similarity searches

Pegg

Babbitt²

1999

Bioinformatics

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Scott C.‐H. Pegg

Development and validation of a modular, extensible docking program: DOCK 5

Leveraging Enzyme Structure−Function Relationships for Functional Inference and Experimental Design: The Structure−Function Linkage Database

Functional Assignment of the 20 S Proteasome from Trypanosoma brucei Using Mass Spectrometry and New Bioinformatics Approaches

Untitled

Shotgun: getting more from sequence similarity searches

Contact Info

Product

Resources

About