The Catalytic Site Atlas (CSA) provides catalytic residue annotation for enzymes in the Protein Data Bank. It is available online at http://www.ebi.ac.uk/ thornton-srv/databases/CSA. The database consists of two types of annotated site: an original handannotated set containing information extracted from the primary literature, using de®ned criteria to assign catalytic residues, and an additional homologous set, containing annotations inferred by PSI-BLAST and sequence alignment to one of the original set. The CSA can be queried via Swiss-Prot identi®er and EC number, as well as by PDB code. CSA Version 1.0 contains 177 original handannotated entries and 2608 homologous entries, and covers~30% of all EC numbers found in PDB. The CSA will be updated on a monthly basis to include homologous sites found in new PDBs, and new hand-annotated enzymes as and when their annotatation is completed.
abYsis is a web-based antibody research system that includes an integrated database of antibody sequence and structure data. The system can be interrogated in numerous ways -from simple text and sequence searches to sophisticated queries that apply 3D structural constraints. The publicly available version includes pre-analysed sequence data from EMBL-ENA and Kabat as well as structure data from the PDB. A researcher's own sequences can also be analysed through the web interface.A defining characteristic of abYsis is that sequences are automatically numbered with a series of popular schemes such as Kabat and Chothia and then annotated with key information such as CDRs and potential post-translational modifications. A unique aspect of abYsis is a set of residue frequency tables for each position in an antibody, allowing 'unusual residues' (those rarely seen at a particular position) to be highlighted and decisions to be made on which mutations may be acceptable. This is especially useful when comparing antibodies from different species.
The Single Amino Acid Polymorphism database (SAAPdb) is a new resource for the analysis and visualization of the structural effects of mutations. Our analytical approach is to map single nucleotide polymorphisms (SNPs) and pathogenic deviations (PDs) to protein structural data held within the Protein Data Bank. By mapping mutations onto protein structures, we can hypothesize whether the mutant residues will have any local structural effect that may "explain" a deleterious phenotype. Our prior work used a similar approach to analyze mutations within a single protein. An analysis of the contents of SAAPdb indicates that there are clear differences in the sequence and structural characteristics of SNPs and PDs, and that PDs are more often explained by our structural analysis. This mapping and analysis is a useful resource for the mutation community and is publicly available at http://www.bioinf.org.uk/saap/db/.
There are currently at least nine distinct glycosidase sequence families which are all known to adopt a TIM barrel fold [Henrissat,B. and Davies,G. (1997) CURR: Opin. Struct. Biol., 7, 637-644]. To explore the relationships between these enzymes and their evolution, comprehensive sequence and structure comparisons were performed, generating four distinct clusters. The first cluster, S1, comprises the alpha-amylase related enzymes, all with the retention mechanism (axial-->axial). The second cluster, S2, included two functional subgroups, one composed of various kinds of glucosidases all with the retention mechanism (equatorial-->equatorial) (the so-called 4/7 superfamily), and the other subgroup including the beta-amylases with the inversion mechanism (axial--> equatorial). The third cluster, S3, with the retention mechanism (equatorial-->equatorial), could be subdivided, based on the catalytic residues and mechanisms, into two functional subgroups: the chitinase group, catalysed by two acidic residues on the C-termini of beta-4 and beta-6, and the hevamine group, using two acidic residues on the C-termini of beta-4 for catalysis. The fourth cluster, S4, is composed of chitobiase with the retention mechanism (equatorial--> equatorial). These clusters are compared with the sequence families derived by Henrissat and coworkers. PSI-BLAST profiles and multiple-alignments of tertiary structures suggest that S1 and S2 are distantly related, as are S3 and S4, which have N-acetylated substrates. This work highlights the difficulties of untangling distant evolutionary relationships in ubiquitous folds such as the TIM barrel.
Because of the extreme impact of genome sequencing projects, protein sequences without accompanying experimental data now dominate public databases. Homology searches, by providing an opportunity to transfer functional information between related proteins, have become the de facto way to address this. Although a single, well annotated, close relationship will often facilitate sufficient annotation, this situation is not always the case, particularly if mutations are present in important functional residues. When only distant relationships are available, the transfer of function information is more tenuous, and the likelihood of encountering several well annotated proteins with different functions is increased. The consequence for a researcher is a range of candidate functions with little way of knowing which, if any, are correct. Here, we address the problem directly by introducing a computational approach to accurately identify and segregate related proteins into those with a functional similarity and those where function differs. This approach should find a wide range of applications, including the interpretation of genomics͞proteomics data and the prioritization of targets for high-throughput structure determination. The method is generic, but here we concentrate on enzymes and apply high-quality catalytic site data. In addition to providing a series of comprehensive benchmarks to show the overall performance of our approach, we illustrate its utility with specific examples that include the correct identification of haptoglobin as a nonenzymatic relative of trypsin, discrimination of acid-D-amino acid ligases from a much larger ligase pool, and the successful annotation of BioH, a structural genomics target. enzymes ͉ function prediction ͉ EC ͉ PSI-BLASTA ssigning function to protein sequences continues to be of key importance (1). Currently, most approaches to protein function prediction rely on searching sequence databases to identify homologous sequences with prior annotation. The most widely used search tools are BLAST and PSI-BLAST (2); at the National Center for Biotechnology Information alone, Ͼ70,000 BLAST searches are performed each day for the general public. It is certainly no coincidence that the BLAST algorithm was the most highly cited paper of the last decade, surpassing all biology publications (3). PSI-BLAST is an iterative method that uses results from a BLAST search to create a profile (position-specific scoring matrix). The profile is used to search the database for additional homologues, and these results can be used to further improve the profile. A profile captures family-specific information, including functionally and structurally important residue positions, and can therefore identify distant homologues not recognized by alignment to a single sequence.PSI-BLAST and powerful fold recognition methods such as GENTHREADER (4) are good at identifying distantly related proteins, even down to 10% sequence identity. However, recent studies have shown that, simply on the basis of overall simi...
Summary: We describe BiopLib, a mature C programming library for manipulating protein structure, and BiopTools, a set of command-line tools which exploit BiopLib. The library also provides a small number of functions for handling protein sequence and general purpose programming and mathematics. BiopLib transparently handles PDBML (XML) format and standard PDB files. BiopTools provides facilities ranging from renumbering atoms and residues to calculation of solvent accessibility.Availability and implementation: BiopLib and BiopTools are implemented in standard ANSI C. The core of the BiopLib library is a reliable PDB parser that handles alternate occupancies and deals with compressed PDB files and PDBML files automatically. The library is designed to be as flexible as possible, allowing users to handle PDB data as a simple list of atoms, or in a structured form using chains, residues and atoms. Many of the BiopTools command-line tools act as filters, taking a PDB (or PDBML) file as input and producing a PDB (or PDBML) file as output. All code is open source and documented using Doxygen. It is provided under the GNU Public Licence and is available from the authors’ web site or from GitHub.Contact: andrew@bioinf.org.uk
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.