By combining crystallographic and NMR structural data for RNA-bound amino acids within riboswitches, aptamers, and RNPs, chemical principles governing specific RNA interaction with amino acids can be deduced. Such principles, which we summarize in a ''polar profile'', are useful in explaining newly selected specific RNA binding sites for free amino acids bearing varied side chains charged, neutral polar, aliphatic, and aromatic. Such amino acid sites can be queried for parallels to the genetic code. Using recent sequences for 337 independent binding sites directed to 8 amino acids and containing 18,551 nucleotides in all, we show a highly robust connection between amino acids and cognate coding triplets within their RNA binding sites. The apparent probability (P) that cognate triplets around these sites are unrelated to binding sites is %5.3 9 10 -45 for codons overall, and P % 2.1 9 10 -46 for cognate anticodons. Therefore, some triplets are unequivocally localized near their present amino acids. Accordingly, there was likely a stereochemical era during evolution of the genetic code, relying on chemical interactions between amino acids and the tertiary structures of RNA binding sites. Use of cognate coding triplets in RNA binding sites is nevertheless sparse, with only 21% of possible triplets appearing. Reasoning from such broad recurrent trends in our results, a majority (approximately 75%) of modern amino acids entered the code in this stereochemical era; nevertheless, a minority (approximately 21%) of modern codons and anticodons were assigned via RNA binding sites.A Direct RNA Template scheme embodying a credible early history for coded peptide synthesis is readily constructed based on these observations.
We have implemented in Python the COmparative GENomic Toolkit, a fully integrated and thoroughly tested framework for novel probabilistic analyses of biological sequences, devising workflows, and generating publication quality graphics. PyCogent includes connectors to remote databases, built-in generalized probabilistic techniques for working with biological sequences, and controllers for third-party applications. The toolkit takes advantage of parallel architectures and runs on a range of hardware and operating systems, and is available under the general public license from http://sourceforge.net/projects/pycogent. RationaleThe genetic divergence of species is affected by both DNA metabolic processes and natural selection. Processes contributing to genetic variation that are undetectable with intraspecific data may be detectable by inter-specific analyses because of the accumulation of signal over evolutionary time scales. As a consequence of the greater statistical power, there is interest in applying comparative analyses to address an increasing number and diversity of problems, in particular analyses that integrate sequence and phenotype. Significant barriers that hinder the extension of comparative analyses to exploit genome indexed phenotypic data include the narrow focus of most analytical tools, and the diverse array of data sources, formats, and tools available. Theoretically coherent integrative analyses can be conducted by combining probabilistic models of different aspects of genotype. Probabilistic models of sequence change underlie many core bioinformatics tasks, including similarity search, sequence alignment, phylogenetic inference, and ancestral state reconstruction. Probabilistic models allow usage of likelihood inference, a powerful approach from statistics, to establish the significance of differences in support of competing hypotheses. Linking different analyses through a shared and explicit probabilistic model of sequence change is thus extremely valuable, and provides a basis for generalizing analyses to more complex models of evolution (for example, to incorporate dependence between sites). Numerous studies have established how biological factors representing metabolic or selective influences can be represented in substitution models as specific parameters that affect rates of interchange between sequence motifs or the spatial occurrence of such rates [1][2][3][4]. Given this solid grounding, it is desirable to have a toolkit that allows flexible parameterization of probabilistic models and interchange of appropriate modules.There are many existing software packages that can manipulate biological sequences and structures, but few allow specification of both truly novel statistical models and detailed workflow control for genome scale datasets. Traditional phylogenetic analysis applications [5,6] typically provide a number of explicitly defined statistical models that are difficult to modify. One exception in which the parameterization of entirely novel substitution models was poss...
Direct sensing of intracellular metabolite concentrations by riboswitch RNAs provides an economical and rapid means to maintain metabolic homeostasis. Since many organisms employ the same class of riboswitch to control different genes or transcription units, it is likely that functional variation exists in riboswitches such that activity is tuned to meet cellular needs. Using a bioinformatic approach, we have identified a region of the purine riboswitch aptamer domain that displays conservation patterns linked to riboswitch activity. Aptamer domain compositions within this region can be divided into nine classes that display a spectrum of activities. Naturally occurring compositions in this region favor rapid association rate constants and slow dissociation rate constants for ligand binding. Using X-ray crystallography and chemical probing, we demonstrate that both the free and bound states are influenced by the composition of this region and that modest sequence alterations have a dramatic impact on activity. The introduction of non-natural compositions result in the inability to regulate gene expression in vivo, suggesting that aptamer domain activity is highly plastic and thus readily tunable to meet cellular needs.
Understanding patterns of rRNA evolution is critical for a number of fields, including structure prediction and phylogeny. The standard model of RNA evolution is that compensatory mutations in stems make up the bulk of the changes between homologous sequences, while unpaired regions are relatively homogeneous. We show that considerable heterogeneity exists in the relative rates of evolution of different secondary structure categories (stems, loops, bulges, etc.) within the rRNA, and that in eukaryotes, loops actually evolve much faster than stems. Both rates of evolution and abundance of different structural categories vary with distance from functionally important parts of the ribosome such as the tRNA path and the peptidyl transferase center. For example, fast-evolving residues are mainly found at the surface; stems are enriched at the subunit interface, and junctions near the peptidyl transferase center. However, different secondary structure categories evolve at different rates even when these effects are accounted for. The results demonstrate that relative rates and patterns of evolution are lineage specific, suggesting that phylogenetically and structurally specific models will improve evolutionary and structural predictions.
Many studies have suggested that the modern cloverleaf structure of tRNA may have arisen through duplication of a primordial hairpin, but the timing of this duplication event has been unclear. Here we measure the level of sequence identity between the two halves of each of a large sample of tRNAs and compare this level to that of chimeric tRNAs constructed either within or between groups defined by phylogeny and/or specificity. We find that actual tRNAs have significantly more matches between the two halves than do random sequences that can form the tRNA structure, but there is no difference in the average level of matching between the two halves of an individual tRNA and the average level of matching between the two halves of the chimeric tRNAs in any of the sets we constructed. These results support the hypothesis that the modern tRNA cloverleaf arose from a single hairpin duplication prior to the divergence of modern tRNA specificities and the three domains of life.
Seven new arginine binding motifs have been selected from a heterogeneous RNA pool containing 17, 25, and 50mer randomized tracts, yielding 131 independently derived binding sites that are multiply isolated. The shortest 17mer random region is sufficient to build varied arginine binding sites using five different conserved motifs (motifs 1a, 1b, 1c, 2, and 4). Dissociation constants are in the fractional millimolar to millimolar range. Binding sites are amino acid side-chain specific and discriminate moderately between L-and D-stereoisomers of arginine, suggesting a molecular focus on side-chain guanidinium. An arginine coding triplet (codon/anticodon) is highly conserved within the largest family of Arg sites (72% of all sequences), as has also been found in minimal, most prevalent RNA binding sites for Ile, His, and Trp.
tRNAs are among the most ancient, highly conserved sequences on earth, but are often thought to be poor phylogenetic markers because they are short, often subject to horizontal gene transfer, and easily change specificity. Here we use an algorithm now commonly used in microbial ecology, UniFrac, to cluster 175 genomes spanning all three domains of life based on the phylogenetic relationships among their complete tRNA pools. We find that the overall pattern of similarities and differences in the tRNA pools recaptures universal phylogeny to a remarkable extent, and that the resulting tree is similar to the distribution of bootstrapped rRNA trees from the same genomes. In contrast, the trees derived from tRNAs of identical specificity or of individual isoacceptors generally produced trees of lower quality. However, some tRNA isoacceptors were very good predictors of the overall pattern of organismal evolution. These results show that UniFrac can extract meaningful biological patterns from even phylogenies with high level of statistical inaccuracy and horizontal gene transfer, and that, overall, the pattern of tRNA evolution tracks universal phylogeny and provides a background against which we can test hypotheses about the evolution of individual isoacceptors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.