A computer method is presented for-finding the most stable secondary structures in long single-stranded RNAs. It is 1-2 orders of magnitude faster than existing codes.The time required for its application increases as N3 for a chain N nucleotides long. As many as 1000 nucleotides can be searched in a single run. The approach is systematic and builds an optimal structure in a straightforward inductive procedure based on an exact mathematical algorithm. Two simple halfmatrices are constructed and the best folded form is read directly from the second matrix by a simple back-tracking procedure. The program utilizes published values for base-pairing energies to compute one structure with the lowest free energy.Due to the rapid increase in our knowledge of the nucleotide sequence of many long single-stranded RNAs, it is of interest to attempt to predict the secondary and tertiary structure of these molecules.A simple method for estimating the free energy of loops found in single-stranded RNA based on their sequence was developed several years ago (1-6). By utilizing this method, the most probable loop structure for a given sequence is obtained from comparison of the relative stability of all of the possible structures that can form. Although this approach alone works easily for short nucleotide sequences, longer sequences require that many alternate structures be assessed and computer assistance becomes essential.A number of algorithms have been developed to apply free energy rules to polynucleotide chains (2, 7-9). The basic method in all of these approaches has been similar. Perfectly matched helices in the sequence are identified. Consistent sets of these helices are then assembled, and the overall free energy of each assembled structure is calculated individually. For long chains, the combinatorial aspects of this approach are very large (10, 11) and the time required for the calculations is extremely long.We have developed an approach to computer folding of large polynucleotide chains in which the algorithm is about 100 times faster than existing approaches. Two simple half-matrices are constructed by an inductive procedure which considers the energy contributions of individual base pairs. The loop structure with the lowest free energy is read directly from the second matrix in a simple fashion. The basic algorithm and its mathematical proof have been presented (12). It was developed initially simply to maximize base pairing along a polynucleotide chain. More recently, we realized that the rules for calculating loop stability based on free energy can be incorporated into the algorithm as well.This presentation provides a simplified explanation of the original algorithm for maximal matching as well as a description of the procedure developed for incorporating energy rules. METHODS Basic Formulation of the Method. The algorithm is designed to evaluate the contribution of individual base pairs to the secondary structure of a polynucleotide chain. The basic principle on which it rests is best understood by c...
A number of heuristic descriptors have been developed previously in conjunction with the mfold package that describe the propensity of individual bases to participate in base pairs and whether or not a predicted helix is "well-determined." They were developed for the "energy dot plot" output of mfold. Two descriptors, P-num and H-num, are used to measure the level of promiscuity in the association of any given nucleotide or helix with alternative complementary pairs. The third descriptor, S-num, measures the propensity of bases to be single-stranded. In the current work, we describe a series of programs that were developed in order to annotate individual structures with "well-definedness" information. We use color annotation to present the information. The programs can annotate PostScript files that are created by the mfold package or the PostScript secondary structure plots produced by the Weiser and Noller program XRNA (Weiser B, Noller HF, 1995, XRNA: Auto-interactive program for modeling RNA, The Center for Molecular Biology of RNA, Santa Cruz, California: University of California; Internet: ftp://fangio.ucsc.edu/pub/XRNA). In addition, these programs can annotate ss files that serve as input to XRNA. The annotation package can also handle structure comparison with a reference structure. This feature can be used to compare predicted structure with a phylogenetically deduced model, to compare two different predicted foldings, and to identify conformational changes that are predicted between wild-type and mutant RNAs. We provide several examples of application. Predicted structures of two RNase P RNAs were colored with P-num information and further annotated with comparative information. The comparative model of a 16S rRNA was annotated with P-num information from mfold and with base pair probabilities obtained from the Vienna RNA folding package. Further annotation adds comparisons with the optimal foldings obtained from mfold and the Vienna package, respectively. The results of all of these analyses are discussed in the context of the reliability of structure prediction.
Recent structural analyses of genomic RNAs from RNA coliphages suggest that both well-determined base paired helices and well-determined structural domains that are identified by 'energy dot plot' analysis using the RNA folding package mfold, are likely to be predicted correctly. To test these observations with another group of large RNAs, we have analyzed 15 ribosomal RNAs. Published secondary structure models that were derived by comparative sequence analysis were used to evaluate the predicted structures. Both the optimal predicted fold and the predicted 'energy dot plot' of each sequence were examined. Each prediction was obtained from a single computer run on an entire ribosomal RNA sequence. All predicted base pairs in optimal foldings were examined for agreement with proven base pairs in the comparative models. Our analyses show that the overall correspondence between the predicted and comparative models varies for different RNAs and ranges from a low of 27% to a high of 70%, with a mean value of 49%. The correspondence improves to a mean value of 81% when the analysis is limited to welldetermined helices. In addition to well-determined helices, large well-determined structural domains can be observed in 'energy dot plots' of some 16S ribosomal RNAs. The predicted domains correspond closely with structural domains that are found by the comparative method in the same RNAs. Our analyses also show that measuring the agreement between predicted and comparative secondary structure models underestimates the reliability of structural prediction by mfold.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.