mRNA molecules are folded in the cells and therefore many of their substrings may actually be inaccessible to protein and microRNA binding. The need to apply an accessibility criterion to the task of genome-wide mRNA motif discovery raises the challenge of overcoming the core O(n(3)) factor imposed by the time complexity of the currently best known algorithms for RNA secondary structure prediction. We speed up the dynamic programming algorithms that are standard for RNA folding prediction. Our new approach significantly reduces the computations without sacrificing the optimality of the results, yielding an expected time complexity of O(n(2) psi(n)), where psi(n) is shown to be constant on average under standard polymer folding models. A benchmark analysis confirms that in practice the runtime ratio between the previous approach and the new algorithm indeed grows linearly with increasing sequence size. The fast new RNA folding algorithm is utilized for genome-wide discovery of accessible cis-regulatory motifs in data sets of ribosomal densities and decay rates of S. cerevisiae genes and to the mining of exposed binding sites of tissue-specific microRNAs in A. thaliana.
An efficient algorithm is presented for detecting approximate tandem repeats in genomic sequences. The algorithm is based on a flexible statistical model which allows a wide range of definitions of approximate tandem repeats. The ideas and methods underlying the algorithm are described and examined and its effectiveness on genomic data is demonstrated.
Cotranslational synthesis of proteins into the endoplasmic reticulum is preceded by targeting of the translating mRNA once a signal peptide emerges from the ribosome exit tunnel. Many mRNAs, however, are unlikely to be targeted by this process because they encode proteins that do not contain a signal peptide or because they are too short to be recognized by the signal recognition particle. Herein we tested the possible involvement of the 39-UTR in the localization of an mRNA that encodes a very short Saccharomyces cerevisiae protein (Pmp1). We found by ribosome density mapping, sedimentation analysis, differential centrifugation, and fluorescent in situ hybridization that the 39-UTR is essential for the association of the transcript with membrane compartments. Fusion of the 39-UTR to heterologous open reading frames conferred on them a sedimentation and cellular localization pattern resembling that of PMP1. Mutation analysis revealed that a repeating UG-rich sequence within the 39-UTR is important for membrane association. Taken together, our results reveal an essential role for elements within the 39-UTR in the localization of an mRNA that is likely to be ignored by the standard signal-dependant mechanism.
Motivation: Association analysis is the method of choice for studying complex multifactorial diseases. The premise of this method is that affected persons contain some common genomic regions with similar SNP alleles and such areas will be found in this analysis. An important disadvantage of GWA studies is that it does not distinguish between genomic areas that are inherited from a common ancestor [identical by descent (IBD)] and areas that are identical merely by state [identical by state (IBS)]. Clearly, areas that can be marked with higher probability as IBD and have the same correlation with the disease status of identical areas that are more probably only IBS, are better candidates to be causative, and yet this distinction is not encoded in standard association analysis.Results: We develop a factorial hidden Markov model-based algorithm for computing genome-wide IBD sharing. The algorithm accepts as input SNP data of measured individuals and estimates the probability of IBD at each locus for every pair of individuals. For two g-degree relatives, when g≥8, the computation yields a precision of IBD tagging of over 50% higher than previous methods for 95% recall. Our algorithm uses a first-order Markovian model for the linkage disequilibrium process and employs a reduction of the state space of the inheritance vector from being exponential in g to quadratic. The higher accuracy along with the reduced time complexity marks our method as a feasible means for IBD mapping in practical scenarios.Availability: A software implementation, called IBDMAP, is freely available at http://bioinfo.cs.technion.ac.il/IBDmap.Contact: sberco@gmail.com
We develop an hidden Markov model (HMM)-based algorithm for computing exact parametric and non-parametric linkage scores in larger pedigrees than was possible before. The algorithm is applicable whenever there are chains of persons in the pedigree with no genetic measurements and with unknown affection status. The algorithm is based on shrinking the state space of the HMM considerably using such chains. In a two g-degree cousins pedigree the reduction drops the state space from being exponential in g to being linear in g. For a Finnish family in which two affected children suffer from a rare cold-inducing sweating syndrome, we were able to reduce the state space by more than five orders of magnitude from 250 to 232. In another pedigree of state-space size of 227, used for a study of pituitary adenoma, the state space reduced by a factor of 8.5 and consequently exact linkage scores can now be computed, rather than approximated.Contact: dang@cs.technion.ac.ilSupplementary information: Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.