Given an RNA sequence and two designated secondary structures A, B, we describe a new algorithm that computes a nearly optimal folding pathway from A to B. The algorithm, RNAtabupath, employs a tabu semi-greedy heuristic, known to be an effective search strategy in combinatorial optimization. Folding pathways, sometimes called routes or trajectories, are computed by RNAtabupath in a fraction of the time required by the barriers program of Vienna RNA Package. We benchmark RNAtabupath with other algorithms to compute low energy folding pathways between experimentally known structures of several conformational switches. The RNApathfinder web server, source code for algorithms to compute and analyze pathways and supplementary data are available at http://bioinformatics.bc.edu/clotelab/RNApathfinder.
DIAL (dihedral alignment) is a web server that provides public access to a new dynamic programming algorithm for pairwise 3D structural alignment of RNA. DIAL achieves quadratic time by performing an alignment that accounts for (i) pseudo-dihedral and/or dihedral angle similarity, (ii) nucleotide sequence similarity and (iii) nucleotide base-pairing similarity.DIAL provides access to three alignment algorithms: global (Needleman–Wunsch), local (Smith–Waterman) and semiglobal (modified to yield motif search). Suboptimal alignments are optionally returned, and also Boltzmann pair probabilities Pr(ai,bj) for aligned positions ai , bj from the optimal alignment. If a non-zero suboptimal alignment score ratio is entered, then the semiglobal alignment algorithm may be used to detect structurally similar occurrences of a user-specified 3D motif. The query motif may be contiguous in the linear chain or fragmented in a number of noncontiguous regions.The DIAL web server provides graphical output which allows the user to view, rotate and enlarge the 3D superposition for the optimal (and suboptimal) alignment of query to target. Although graphical output is available for all three algorithms, the semiglobal motif search may be of most interest in attempts to identify RNA motifs. DIAL is available at http://bioinformatics.bc.edu/clotelab/DIAL.
RNA shapes, introduced by Giegerich et al. (2004), provide a useful classification of the branching complexity for RNA secondary structures. In this paper, we derive an exact value for the asymptotic number of RNA shapes, by relying on an elegant relation between non-ambiguous, context-free grammars, and generating functions. Our results provide a theoretical upper bound on the length of RNA sequences amenable to probabilistic shape analysis (Steffen et al., 2006; Voss et al., 2006), under the assumption that any base can basepair with any other base. Since the relation between context-free grammars and asymptotic enumeration is simple, yet not well-known in bioinformatics, we give a self-contained presentation with illustrative examples. Additionally, we prove a surprising 1-to-1 correspondence between pi-shapes and Motzkin numbers.
An RNA secondary structure is locally optimal if there is no lower energy structure that can be obtained by the addition or removal of a single base pair, where energy is defined according to the widely accepted Turner nearest neighbor model. Locally optimal structures form kinetic traps, since any evolution away from a locally optimal structure must involve energetically unfavorable folding steps. Here, we present a novel, efficient algorithm to compute the partition function over all locally optimal secondary structures of a given RNA sequence. Our software, RNAlocopt runs in time and space. Additionally, RNAlocopt samples a user-specified number of structures from the Boltzmann subensemble of all locally optimal structures. We apply RNAlocopt to show that (1) the number of locally optimal structures is far fewer than the total number of structures – indeed, the number of locally optimal structures approximately equal to the square root of the number of all structures, (2) the structural diversity of this subensemble may be either similar to or quite different from the structural diversity of the entire Boltzmann ensemble, a situation that depends on the type of input RNA, (3) the (modified) maximum expected accuracy structure, computed by taking into account base pairing frequencies of locally optimal structures, is a more accurate prediction of the native structure than other current thermodynamics-based methods. The software RNAlocopt constitutes a technical breakthrough in our study of the folding landscape for RNA secondary structures. For the first time, locally optimal structures (kinetic traps in the Turner energy model) can be rapidly generated for long RNA sequences, previously impossible with methods that involved exhaustive enumeration. Use of locally optimal structure leads to state-of-the-art secondary structure prediction, as benchmarked against methods involving the computation of minimum free energy and of maximum expected accuracy. Web server and source code available at http://bioinformatics.bc.edu/clotelab/RNAlocopt/.
A previous analysis of serum insulin-like growth factor I (IGF-I) levels in a mouse population (n ϭ 961) derived from a cross of (BALB/cJ ϫ C57BL/6J) F 1 females and (C3H/HeJ ϫ DBA/2J) F1 males documented quantitative trait loci (QTL) on chromosomes 1, 10, and 17. We employed a newly developed, random walk-based method to search for three-and four-way allelic combinations that might influence IGF-I levels through nonadditive (conditional or epistatic) interactions among 185 genotyped biallelic loci and with significance defined by experimentwide permutation (P Ͻ 0.05). We documented a three-locus combination in which an epistatic interaction between QTL on paternalderived chromosomes 5 and 18 had an opposite effect on the phenotype based on the allele inherited at a third locus on maternal-derived chromosome 17. The search also revealed three four-locus combinations that influence IGF-I levels through nonadditive genetic interactions. In two cases, the four-allele combinations were associated with animals having high levels of IGF-I, and, in the third case, a fourallele combination was associated with animals having low IGF-I levels. The multiple-locus genome scan algorithm revealed new IGF-I QTL on chromosomes 2, 4, 5, 7, 8, and 12 that had not been detected in the single-locus genome search and showed that levels of this hormone can be regulated by complex, nonadditive interactions among multiple loci. The analysis method can detect multilocus interactions in a genome scan experiment and may provide new ways to explore the genetic architecture of complex physiological phenotypes.quantitative trait loci; epistasis; gene interactions MANY TRAITS OF INTEREST to biological and medical science are determined by the interaction of multiple factors. The genetic and environmental variation among individuals in a population results in a broadened phenotype range and, in many cases, obscures the relationships connecting the causative factors. Although the paradigm of "one gene-one phenotype" has been successfully exploited in experimental biology, the multiple genes that underlie interindividual variation in most traits remain unresolved. Consequently, a significant challenge remains for biomedical research to develop the tools for the deconstruction and understanding of the genetic network, or architecture, of complex traits (16,21,25,33,41).In experimental organisms, the individual causative genes that underlie a phenotype can be identified through conventional linkage studies, targeted mutational analysis, or quantitative trait locus (QTL) analysis (16,20). After identification, the single genes can be shown, by experiment, to interact within a more complex functional pathway or network. Alternatively, interconnected genetic factors can be identified by searches for second-site modifier genes. In model organisms such as yeast, Caenorhabditis elegans, and Drosophila melanogaster, the modifier gene strategy has been exceptionally valuable (1, 35). The mapping and molecular cloning of second-site modifiers of specific p...
We describe several dynamic programming segmentation algorithms to segment RNA secondary and tertiary structures into distinct domains. For this purpose, we consider fitness functions that variously depend on (i) base pairing probabilities in the Boltzmann low energy ensemble of structures, (ii) contact maps inferred from 3-dimensional structures, and (iii) Voronoi tessellation computed from 3-dimensional structures. Segmentation algorithms include a direct dynamic programming method, previously discovered by Bellman and by Finkelstein and Roytberg, as well as two novel algorithms -a parametric algorithm to compute the optimal segmentation into k classes, for each value k, and an algorithm that simultaneously computes the optimal segmentation of all subsegments.Since many non-coding RNA gene finders scan the genome by a moving window method, reporting high-scoring windows, we apply structural segmentation to determine the most likely 5 ′ and 3 ′ boundaries of precursor microRNAs. When tested on all precursor microRNAs of length at most 100 nt from the Rfam database, benchmarking studies indicate that segmentation determines the 5 ′ boundary with discrepancy (absolute value of difference between predicted and real boundaries) having mean −0.640 (stdev 15.196) and the 3 ′ boundary with discrepancy having mean −0.266 (stdev. 17.415). This yields a sensitivity of 0.911 and positive predictive value of 0.906 for determination of exact boundaries of precursor microRNAs within a window of approximately 900 nt. Additionally, by comparing the manual segmentation of Jaeger et al. with our optimal structural segmentation of 16S and 16S-like rRNA of E. coli, rat mitochondria, Halobacterium volcanii, and Chlamydomonas reinhardii chloroplast into 4 segments, we establish the usefulness of (automated) structural segmentation in decomposing large RNA structures into distinct domains.Availability: Source code for all algorithms is available at http://bioinformatics.bc.edu/clotelab/.
BackgroundIt has been increasingly appreciated that coding sequences harbor regulatory sequence motifs in addition to encoding for protein. These sequence motifs are expected to be overrepresented in nucleotide sequences bound by a common protein or small RNA. However, detecting overrepresented motifs has been difficult because of interference by constraints at the protein level. Sampling-based approaches to solve this problem based on codon-shuffling have been limited to exploring only an infinitesimal fraction of the sequence space and by their use of parametric approximations.ResultsWe present a novel O(N(log N)2)-time algorithm, CodingMotif, to identify nucleotide-level motifs of unusual copy number in protein-coding regions. Using a new dynamic programming algorithm we are able to exhaustively calculate the distribution of the number of occurrences of a motif over all possible coding sequences that encode the same amino acid sequence, given a background model for codon usage and dinucleotide biases. Our method takes advantage of the sparseness of loci where a given motif can occur, greatly speeding up the required convolution calculations. Knowledge of the distribution allows one to assess the exact non-parametric p-value of whether a given motif is over- or under- represented. We demonstrate that our method identifies known functional motifs more accurately than sampling and parametric-based approaches in a variety of coding datasets of various size, including ChIP-seq data for the transcription factors NRSF and GABP.ConclusionsCodingMotif provides a theoretically and empirically-demonstrated advance for the detection of motifs overrepresented in coding sequences. We expect CodingMotif to be useful for identifying motifs in functional genomic datasets such as DNA-protein binding, RNA-protein binding, or microRNA-RNA binding within coding regions. A software implementation is available at http://bioinformatics.bc.edu/chuanglab/codingmotif.tar
BackgroundSince RNA molecules regulate genes and control alternative splicing by allostery, it is important to develop algorithms to predict RNA conformational switches. Some tools, such as , can be used to predict potential conformational switches; nevertheless, no existent tool can detect general (i.e., not family specific) entire riboswitches (both aptamer and expression platform) with accuracy. Thus, the development of additional algorithms to detect conformational switches seems important, especially since the difference in free energy between the two metastable secondary structures may be as large as 15-20 kcal/mol. It has recently emerged that RNA secondary structure can be more accurately predicted by computing the maximum expected accuracy (MEA) structure, rather than the minimum free energy (MFE) structure.ResultsGiven an arbitrary RNA secondary structure S0 for an RNA nucleotide sequence a = a1,..., an, we say that another secondary structure S of a is a k-neighbor of S0, if the base pair distance between S0 and S is k. In this paper, we prove that the Boltzmann probability of all k-neighbors of the minimum free energy structure S0 can be approximated with accuracy ε and confidence 1 - p, simultaneously for all 0 ≤ k < K, by a relative frequency count over N sampled structures, provided that NMathClass-rel>N(εMathClass-punc,pMathClass-punc,K)MathClass-rel=ΦMathClass-bin-1)(p2K24ε2, where Φ(z) is the cumulative distribution function (CDF) for the standard normal distribution. We go on to describe the algorithm , which for an arbitrary initial structure S0 and for all values 0 ≤ k < K, computes the secondary structure MEA(k), having maximum expected accuracy over all k-neighbors of S0. Computation time is O(n3 · K2), and memory requirements are O(n2 · K). We analyze a sample TPP riboswitch, and apply our algorithm to the class of purine riboswitches.ConclusionsThe approximation of by sampling, with rigorous bound on accuracy, together with the computation of maximum expected accuracy k-neighbors by , provide additional tools toward conformational switch detection. Results from are quite distinct from other tools, such as , hence may provide orthogonal information when looking for suboptimal structures or conformational switches. Source code for can be downloaded from http://sourceforge.net/projects/rnabormea/ or http://bioinformatics.bc.edu/clotelab/RNAborMEA/.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.