Since 1989, about 570 different p53 mutations have been identified in more than 8000 human cancers. A database of these mutations was initiated by M. Hollstein and C. C. Harris in 1990. This database originally consisted of a list of somatic point mutations in the p 53 gene of human tumors and cell lines, compiled from the published literature and made available in a standard electronic form. The database is maintained at the International Agency for Research on Cancer (IARC) and updated versions are released twice a year (January and July). The current version (July 1997) contains records on 6800 published mutations and will surpass the 8000 mark in the January 1998 release. The database now contains information on somatic and germline mutations in a new format to facilitate data retrieval. In addition, new tools are constructed to improve data analysis, such as a Mutation Viewer Java applet developed at the European Bioinformatics Institute (EBI) to visualise the location and impact of mutations on p53 protein structure. The database is available in different electronic formats at IARC (http://www.iarc. fr/p53/homepage.htm ) or from the EBI server (http://www.ebi.ac.uk ). The IARC p53 website also provides reports on database analysis and links with other p53 sites as well as with related databases. In this report, we describe the criteria for inclusion of data, the revised format and the new visualisation tools. We also briefly discuss the relevance of p 53 mutations to clinical and biological questions.
We have developed a method for identifying fold families in the protein structure data bank. Pairwise sequence alignments are first performed to extract families of homologous proteins having 35% or more sequence identity. Representatives are selected with the best resolution and R-factor to give a nonhomologous data set. Subsequent structure comparisons between all members of this set detect homologous folds with low sequence identity but highly conserved structures. By softening the requirement on structural similarity, families of analogous proteins are obtained that have related folds but more diverse structures. Representatives are selected to give a non-analogous data set. Starting with 1410 chains from the Brookhaven Data Bank, we generate a set of 150 nonhomologous folds and a set of 112 non-analogous folds. Analysis of sequence and structure conservation within the larger families shows the globins to be the most highly conserved family and the TIM barrels the most weakly conserved.
Although it is known that three-dimensional structure is well conserved during the evolutionary development of proteins, there have been few studies that consider other parameters apart from divergence of the main-chain coordinates. In this study, we align the structures of 90 pairs of homologous proteins having sequence identities ranging from 5 to 100%. Their structures are compared as a function of sequence identity, including not only consideration of Ca coordinates but also accessibility, Ooi numbers, secondary structure, and side-chain angles. We discuss how these properties change as the sequences become less similar. This will be of practical use in homology modeling, especially for modeling very distantly related or analogous proteins. We also consider how the average size and number of insertions and deletions vary as sequences diverge. This study presents further quantitative evidence that structure is remarkably well conserved in detail, as well as at the topological level, even when the sequences do not show similarity that is significant statistically.
A method was developed to compare protein structures and to combine them into a multiple structure consensus. Previous methods of multiple structure comparison have only concatenated pairwise alignments or produced a consensus structure by averaging coordinate sets. The current method is a fusion of the fast structure comparison program SSAP and the multiple sequence alignment program MULTAL. As in MULTAL, structures are progressively combined, producing intermediate consensus structures that are compared directly to each other and all remaining single structures, This leads to a hierarchic "condensation," continually evaluated in the light of the emerging conserved core regions.Following the SSAP approach, all interatomic vectors were retained with well-conserved regions distinguished by coherent vector bundles (the structural equivalent of a conserved sequence position). Each bundle of vectors is summarized by a resultant, whereas vector coherence is captured in an error term, which is the only distinction between conserved and variable positions. Resultant vectors are used directly in the comparison, which is weighted by their error values, giving greater importance to the matching of conserved positions. The resultant vectors and their errors can also be used directly in molecular modeling.Applications of the method were assessed by the quality of the resulting sequence alignments, phylogenetic tree construction, and databank scanning with the consensus. Visual assessment of the structural superpositions and consensus structure for various well-characterized families confirmed that the consensus had identified a reasonable core.
An algorithm is described for automatically generating protein topology cartoons. This algorithm optimally places circles and triangles depicting alpha-helices and beta-strands respectively giving a pictorial topological summary of any protein structure. beta-Sheets, sandwiches and barrels are automatically identified and represented using special templates. The output from this algorithm may be controlled by adjustment of variable weights during the optimization step giving a preferred result. The rules for generating protein toplogy cartoons, including consideration of the handedness of local structure motifs, are discussed. The design of this algorithm is completely general and is easily adapted to include further rules that dictate the generation of the cartoons.
The topology of a protein structure is a highly simplified description of its fold including only the sequence of secondary structure elements, and their relative spatial positions and approximate orientations. This information can be embodied in a two-dimensional diagram of protein topology, called a TOPS cartoon. These cartoons are useful for the understanding of particular folds and making comparisons between folds. Here we describe a new algorithm for the production of TOPS cartoons, which is more robust than those previously available, and has a much higher success rate. This algorithm has been used to produce a database of protein topology cartoons that covers most of the data bank of known protein structures.Keywords: protein structure; topological diagram; topological representation; topology; TOPS In recent years, the experimental techniques of NMR and X-ray crystallography have delivered a large number of protein 3D structures. Knowledge of these structures is of central importance to studies of protein function and evolution, particularly since it has become apparent that structure is much more strongly conserved through evolution than sequence. With the increasing number of structures has come a need for better tools for automated structural analysis and visualization.Visualization of the 3D folds of proteins can be difficult. Threedimensional models can be viewed using graphics programs like RASMOL~Sayle & Milner-White, 1995!, and folds can be made clearer by display options that show only the peptide backbone with secondary structures represented by ribbons. However, these representations still have to be rotated to find a good viewing angle, and manual comparison of different structures quickly becomes very difficult when more than a few are involved.Comparison of the protein folds is much easier when they are reduced to a topological level, at which details like the lengths and precise orientations of secondary structures, and structures of connecting loops, are ignored. Such a representation is embodied in a 2D TOPS cartoon. Some examples are shown in Figure 1 along with the 3D structures for comparison. The cartoons show the secondary structure elements~SSEs! and how they are connected in sequence. Also represented are the relative spatial positions and approximate orientations of the SSEs. Strands that are linked by hydrogen bond ladders are adjacent to each other in the cartoon, and SSEs that are otherwise spatial neighbors in the fold are plotted close together. Orientations of the SSEs are shown in the approximation that they have one of two directions, "up"~out of the page! or "down"~into the page!. Figure 1 clearly shows how TOPS cartoons simplify the understanding of the folding topology of a single structure, and enable comparison between related structures. The folds in Figure 1 all contain a "jelly roll" folding motif Richardson, 1981;Stirk et al., 1992! highlighted as shaded triangles, a fact that is much clearer from the TOPS cartoons than the 3D structures.The first TOPS ca...
The use of neural networks to improve empirical secondary structure prediction is explored with regard to the identification of the position and conformational class of beta-turns, a four-residue chain reversal. Recently an algorithm was developed for beta-turn predictions based on the empirical approach of Chou and Fasman using different parameters for three classes (I, II and non-specific) of beta-turns. In this paper, using the same data, an alternative approach to derive an empirical prediction method is used based on neural networks which is a general learning algorithm extensively used in artificial intelligence. Thus the results of the two approaches can be compared. The most severe test of prediction accuracy is the percentage of turn predictions that are correct and the neural network gives an overall improvement from 20.6% to 26.0%. The proportion of correctly predicted residues is 71%, compared to a chance level of about 58%. Thus neural networks provide a method of obtaining more accurate predictions from empirical data than a simpler method of deriving propensities.
The European Bioinformatics Institute (EBI) maintains and distributes the EMBL Nucleotide Sequence database, Europe's primary nucleotide sequence data resource. The EBI also maintains and distributes the SWISS-PROT Protein Sequence database, in collaboration with Amos Bairoch of the University of Geneva. Over fifty additional specialist molecular biology databases, as well as software and documentation of interest to molecular biologists are available. The EBI network services include database searching and sequence similarity searching facilities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.