Single-stranded RNA viruses encompass broad classes of infectious agents and cause the common cold, cancer, AIDS, and other serious health threats. Viral replication is regulated at many levels, including using conserved genomic RNA structures. Most potential regulatory elements within viral RNA genomes are uncharacterized. Here we report the structure of an entire HIV-1 genome at single nucleotide resolution using SHAPE, a high-throughput RNA analysis technology. The genome encodes protein structure at two levels. In addition to the correspondence between RNA and protein primary sequences, a correlation exists between high levels of RNA structure and sequences that encode inter-domain loops in HIV proteins. This correlation suggests RNA structure modulates ribosome elongation to promote native protein folding. Some simple genome elements previously shown to be important, including the ribosomal gag-pol frameshift stem-loop, are components of larger RNA motifs. We also identify organizational principles for unstructured RNA regions. Highly used splice acceptors lie in unstructured motifs and hypervariable regions are sequestered from flanking genome regions by stable insulator helices. These results emphasize that the HIV-1 genome and, potentially, many coding RNAs are punctuated by numerous previously unrecognized regulatory motifs and that extensive RNA structure may constitute an additional level of the genetic code.
A pseudoknot forms in an RNA when nucleotides in a loop pair with a region outside the helices that close the loop. Pseudoknots occur relatively rarely in RNA but are highly overrepresented in functionally critical motifs in large catalytic RNAs, in riboswitches, and in regulatory elements of viruses. Pseudoknots are usually excluded from RNA structure prediction algorithms. When included, these pairings are difficult to model accurately, especially in large RNAs, because allowing this structure dramatically increases the number of possible incorrect folds and because it is difficult to search the fold space for an optimal structure. We have developed a concise secondary structure modeling approach that combines SHAPE (selective 2′-hydroxyl acylation analyzed by primer extension) experimental chemical probing information and a simple, but robust, energy model for the entropic cost of single pseudoknot formation. Structures are predicted with iterative refinement, using a dynamic programming algorithm. This melded experimental and thermodynamic energy function predicted the secondary structures and the pseudoknots for a set of 21 challenging RNAs of known structure ranging in size from 34 to 530 nt. On average, 93% of known base pairs were predicted, and all pseudoknots in wellfolded RNAs were identified. Information is encoded in an RNA molecule at two levels: in its primary sequence and in its ability to form higher-order secondary and tertiary structures. Nearly all RNAs can fold to form some secondary structure and, in many RNAs, highly structured regions encode important regulatory motifs. Such structured regulatory elements can be composed of canonical base pairs but may also feature specialized and distinctive RNA structures. Among the best characterized of these specialized structures are RNA pseudoknots. Pseudoknots are relatively rare but occur overwhelmingly in functionally important regions of RNA (2-4). For example, all of the large catalytic RNAs contain pseudoknots (5, 6); roughly two-thirds of the known classes of riboswitches contain pseudoknots that appear to be essential for ligand binding and gene regulatory functions (7); and pseudoknots occur prominently in the regulatory elements that viruses use to usurp cellular metabolism (3). Pseudoknots are thus harbingers of biological function. An important and challenging goal is to identify these structures reliably.Pseudoknots are excluded from the most widely used algorithms that model RNA secondary structure (8). This exclusion is based on the challenge of incorporating the pseudoknot structure into the efficient dynamic programming algorithm used in the most popular secondary structure prediction approaches and because of the additional computational effort required. The prediction of lowest free energy structures with pseudoknots is NP-complete (9), which means that lowest free energy structure cannot be solved as a function of sequence length in polynomial time. In addition, allowing pseudoknots greatly increases the number of (incorrect) hel...
RNA secondary structure modeling is a challenging problem, and recent successes have raised the standards for accuracy, consistency, and tractability. Large increases in accuracy have been achieved by including data on reactivity toward chemical probes: Incorporation of 1M7 SHAPE reactivity data into an mfold-class algorithm results in median accuracies for base pair prediction that exceed 90%. However, a few RNA structures are modeled with significantly lower accuracy. Here, we show that incorporating differential reactivities from the NMIA and 1M6 reagents-which detect noncanonical and tertiary interactions-into prediction algorithms results in highly accurate secondary structure models for RNAs that were previously shown to be difficult to model. For these RNAs, 93% of accepted canonical base pairs were recovered in SHAPE-directed models. Discrepancies between accepted and modeled structures were small and appear to reflect genuine structural differences. Three-reagent SHAPE-directed modeling scales concisely to structurally complex RNAs to resolve the in-solution secondary structure analysis problem for many classes of RNA.
All retroviral genomic RNAs contain a cis-acting packaging signal by which dimeric genomes are selectively packaged into nascent virions. However, it is not understood how Gag (the viral structural protein) interacts with these signals to package the genome with high selectivity. We probed the structure of murine leukemia virus RNA inside virus particles using SHAPE, a high-throughput RNA structure analysis technology. These experiments showed that NC (the nucleic acid binding domain derived from Gag) binds within the virus to the sequence UCUG-UR-UCUG. Recombinant Gag and NC proteins bound to this same RNA sequence in dimeric RNA in vitro; in all cases, interactions were strongest with the first U and final G in each UCUG element. The RNA structural context is critical: High-affinity binding requires base-paired regions flanking this motif, and two UCUG-UR-UCUG motifs are specifically exposed in the viral RNA dimer. Mutating the guanosine residues in these two motifs-only four nucleotides per genomic RNA-reduced packaging 100-fold, comparable to the level of nonspecific packaging. These results thus explain the selective packaging of dimeric RNA. This paradigm has implications for RNA recognition in general, illustrating how local context and RNA structure can create information-rich recognition signals from simple single-stranded sequence elements in large RNAs.retrovirus | RNA recognition code | RNA SHAPE chemistry E xpression of a single viral protein, termed Gag, is sufficient for assembly of retrovirus-like particles in mammalian cells. If present in the cell, the viral genomic RNA (vRNA) is selectively packaged into nascent particles; this selectivity is due to a cisacting packaging signal in the RNA, termed Ψ (1, 2). Remarkably, when no Ψ-containing RNA is present, Gag still assembles efficiently, encapsidating cellular mRNAs nonselectively in place of the vRNA (3-5).There are many indications that Ψ represents a high-affinity binding site for the Gag protein both in HIV-1 and in simpler retroviruses (6-14). However, the molecular mechanisms underlying selective encapsidation of vRNAs are incompletely understood, as are the features that enable Gag to bind preferentially to vRNA rather than to other cellular RNAs. Gag proteins contain several distinct domains, always including matrix (MA), capsid, and nucleocapsid (NC). vRNA packaging is mediated by the multidomain Gag protein, but Gag is cleaved following release of the virus from the cell. The NC domain plays a principal role in interactions with nucleic acids and is largely responsible for the specific interaction between Gag and its cognate viral RNA (12,13). This domain of Gag is highly basic and contains one or more "zinc knuckles" with a conserved spacing of Zn 2þ -coordinating cysteine and histidine residues. Mutations that abolish Zn 2þ coordination impair selective encapsidation of vRNA during virus assembly (6, 15). In addition, MA domains of many retroviral Gag proteins interact with nucleic acids (16-21) and may also contribute to specific i...
The difficulty of analyzing higher order RNA structure, especially for folding intermediates and for RNAs whose functions require domains that are conformationally flexible, emphasizes the need for new approaches for modeling RNA tertiary structure accurately. Here, we report a concise approach that makes use of facile RNA structure probing experiments that are then interpreted using a computational algorithm, carefully tailored to optimize both the resolution and refinement speed for the resulting structures, without requiring user intervention. The RNA secondary structure is first established using SHAPE chemistry. We then use a sequence-directed cleavage agent, that can be placed arbitrarily in many helical motifs, to obtain high quality inter-residue distances. We interpret this in-solution chemical information using a fast, coarse grained, discrete molecular dynamics engine in which each RNA nucleotide is represented by pseudoatoms for the phosphate, ribose and nucleobase groups. By this approach, we refine base paired positions in yeast tRNA Asp to 4 Å RMSD without any preexisting information or assumptions about secondary or tertiary structures. This blended experimental and computational approach has the potential to yield native-like models for the diverse universe of functionally important RNAs whose structures cannot be characterized by conventional structural methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.