Single-stranded RNA viruses encompass broad classes of infectious agents and cause the common cold, cancer, AIDS, and other serious health threats. Viral replication is regulated at many levels, including using conserved genomic RNA structures. Most potential regulatory elements within viral RNA genomes are uncharacterized. Here we report the structure of an entire HIV-1 genome at single nucleotide resolution using SHAPE, a high-throughput RNA analysis technology. The genome encodes protein structure at two levels. In addition to the correspondence between RNA and protein primary sequences, a correlation exists between high levels of RNA structure and sequences that encode inter-domain loops in HIV proteins. This correlation suggests RNA structure modulates ribosome elongation to promote native protein folding. Some simple genome elements previously shown to be important, including the ribosomal gag-pol frameshift stem-loop, are components of larger RNA motifs. We also identify organizational principles for unstructured RNA regions. Highly used splice acceptors lie in unstructured motifs and hypervariable regions are sequestered from flanking genome regions by stable insulator helices. These results emphasize that the HIV-1 genome and, potentially, many coding RNAs are punctuated by numerous previously unrecognized regulatory motifs and that extensive RNA structure may constitute an additional level of the genetic code.
A pseudoknot forms in an RNA when nucleotides in a loop pair with a region outside the helices that close the loop. Pseudoknots occur relatively rarely in RNA but are highly overrepresented in functionally critical motifs in large catalytic RNAs, in riboswitches, and in regulatory elements of viruses. Pseudoknots are usually excluded from RNA structure prediction algorithms. When included, these pairings are difficult to model accurately, especially in large RNAs, because allowing this structure dramatically increases the number of possible incorrect folds and because it is difficult to search the fold space for an optimal structure. We have developed a concise secondary structure modeling approach that combines SHAPE (selective 2′-hydroxyl acylation analyzed by primer extension) experimental chemical probing information and a simple, but robust, energy model for the entropic cost of single pseudoknot formation. Structures are predicted with iterative refinement, using a dynamic programming algorithm. This melded experimental and thermodynamic energy function predicted the secondary structures and the pseudoknots for a set of 21 challenging RNAs of known structure ranging in size from 34 to 530 nt. On average, 93% of known base pairs were predicted, and all pseudoknots in wellfolded RNAs were identified. Information is encoded in an RNA molecule at two levels: in its primary sequence and in its ability to form higher-order secondary and tertiary structures. Nearly all RNAs can fold to form some secondary structure and, in many RNAs, highly structured regions encode important regulatory motifs. Such structured regulatory elements can be composed of canonical base pairs but may also feature specialized and distinctive RNA structures. Among the best characterized of these specialized structures are RNA pseudoknots. Pseudoknots are relatively rare but occur overwhelmingly in functionally important regions of RNA (2-4). For example, all of the large catalytic RNAs contain pseudoknots (5, 6); roughly two-thirds of the known classes of riboswitches contain pseudoknots that appear to be essential for ligand binding and gene regulatory functions (7); and pseudoknots occur prominently in the regulatory elements that viruses use to usurp cellular metabolism (3). Pseudoknots are thus harbingers of biological function. An important and challenging goal is to identify these structures reliably.Pseudoknots are excluded from the most widely used algorithms that model RNA secondary structure (8). This exclusion is based on the challenge of incorporating the pseudoknot structure into the efficient dynamic programming algorithm used in the most popular secondary structure prediction approaches and because of the additional computational effort required. The prediction of lowest free energy structures with pseudoknots is NP-complete (9), which means that lowest free energy structure cannot be solved as a function of sequence length in polynomial time. In addition, allowing pseudoknots greatly increases the number of (incorrect) hel...
RNA secondary structure modeling is a challenging problem, and recent successes have raised the standards for accuracy, consistency, and tractability. Large increases in accuracy have been achieved by including data on reactivity toward chemical probes: Incorporation of 1M7 SHAPE reactivity data into an mfold-class algorithm results in median accuracies for base pair prediction that exceed 90%. However, a few RNA structures are modeled with significantly lower accuracy. Here, we show that incorporating differential reactivities from the NMIA and 1M6 reagents-which detect noncanonical and tertiary interactions-into prediction algorithms results in highly accurate secondary structure models for RNAs that were previously shown to be difficult to model. For these RNAs, 93% of accepted canonical base pairs were recovered in SHAPE-directed models. Discrepancies between accepted and modeled structures were small and appear to reflect genuine structural differences. Three-reagent SHAPE-directed modeling scales concisely to structurally complex RNAs to resolve the in-solution secondary structure analysis problem for many classes of RNA.
All retroviral genomic RNAs contain a cis-acting packaging signal by which dimeric genomes are selectively packaged into nascent virions. However, it is not understood how Gag (the viral structural protein) interacts with these signals to package the genome with high selectivity. We probed the structure of murine leukemia virus RNA inside virus particles using SHAPE, a high-throughput RNA structure analysis technology. These experiments showed that NC (the nucleic acid binding domain derived from Gag) binds within the virus to the sequence UCUG-UR-UCUG. Recombinant Gag and NC proteins bound to this same RNA sequence in dimeric RNA in vitro; in all cases, interactions were strongest with the first U and final G in each UCUG element. The RNA structural context is critical: High-affinity binding requires base-paired regions flanking this motif, and two UCUG-UR-UCUG motifs are specifically exposed in the viral RNA dimer. Mutating the guanosine residues in these two motifs-only four nucleotides per genomic RNA-reduced packaging 100-fold, comparable to the level of nonspecific packaging. These results thus explain the selective packaging of dimeric RNA. This paradigm has implications for RNA recognition in general, illustrating how local context and RNA structure can create information-rich recognition signals from simple single-stranded sequence elements in large RNAs.retrovirus | RNA recognition code | RNA SHAPE chemistry E xpression of a single viral protein, termed Gag, is sufficient for assembly of retrovirus-like particles in mammalian cells. If present in the cell, the viral genomic RNA (vRNA) is selectively packaged into nascent particles; this selectivity is due to a cisacting packaging signal in the RNA, termed Ψ (1, 2). Remarkably, when no Ψ-containing RNA is present, Gag still assembles efficiently, encapsidating cellular mRNAs nonselectively in place of the vRNA (3-5).There are many indications that Ψ represents a high-affinity binding site for the Gag protein both in HIV-1 and in simpler retroviruses (6-14). However, the molecular mechanisms underlying selective encapsidation of vRNAs are incompletely understood, as are the features that enable Gag to bind preferentially to vRNA rather than to other cellular RNAs. Gag proteins contain several distinct domains, always including matrix (MA), capsid, and nucleocapsid (NC). vRNA packaging is mediated by the multidomain Gag protein, but Gag is cleaved following release of the virus from the cell. The NC domain plays a principal role in interactions with nucleic acids and is largely responsible for the specific interaction between Gag and its cognate viral RNA (12,13). This domain of Gag is highly basic and contains one or more "zinc knuckles" with a conserved spacing of Zn 2þ -coordinating cysteine and histidine residues. Mutations that abolish Zn 2þ coordination impair selective encapsidation of vRNA during virus assembly (6, 15). In addition, MA domains of many retroviral Gag proteins interact with nucleic acids (16-21) and may also contribute to specific i...
The difficulty of analyzing higher order RNA structure, especially for folding intermediates and for RNAs whose functions require domains that are conformationally flexible, emphasizes the need for new approaches for modeling RNA tertiary structure accurately. Here, we report a concise approach that makes use of facile RNA structure probing experiments that are then interpreted using a computational algorithm, carefully tailored to optimize both the resolution and refinement speed for the resulting structures, without requiring user intervention. The RNA secondary structure is first established using SHAPE chemistry. We then use a sequence-directed cleavage agent, that can be placed arbitrarily in many helical motifs, to obtain high quality inter-residue distances. We interpret this in-solution chemical information using a fast, coarse grained, discrete molecular dynamics engine in which each RNA nucleotide is represented by pseudoatoms for the phosphate, ribose and nucleobase groups. By this approach, we refine base paired positions in yeast tRNA Asp to 4 Å RMSD without any preexisting information or assumptions about secondary or tertiary structures. This blended experimental and computational approach has the potential to yield native-like models for the diverse universe of functionally important RNAs whose structures cannot be characterized by conventional structural methods.
Accurate RNA structure modeling is an important, incompletely solved, challenge. Single-nucleotide resolution SHAPE (selective 2'-hydroxyl acylation analyzed by primer extension) yields an experimental measurement of local nucleotide flexibility that can be incorporated as pseudo-free energy change constraints to direct secondary structure predictions. Prior work from our laboratory has emphasized both the overall accuracy of this approach and the need for nuanced interpretation of some apparent discrepancies between modeled and accepted structures. Recent studies by Das and colleagues [Kladwang et al., Biochemistry 50:8049 (2011) and Nat. Chem. 3:954 (2011)], focused on analyzing six small RNAs, yielded poorer RNA secondary structure predictions than expected based on prior benchmarking efforts. To understand the features that led to these divergent results, we re-examined four RNAs yielding the poorest results in this recent work – tRNAPhe, the adenine and cyclic-di-GMP riboswitches, and 5S rRNA. Most of the errors reported by Das and colleagues reflected non-standard experiment and data processing choices, and selective scoring rules. For two RNAs, tRNAPhe and the adenine riboswitch, secondary structure predictions are nearly perfect if no experimental information is included but were rendered inaccurate by the Das and colleagues SHAPE data. When best practices were used, single-sequence SHAPE-directed secondary structure modeling recovered ~93% of individual base pairs and greater than 90% of helices in the four RNAs, essentially indistinguishable from the mutate-and-map approach with the exception of a single helix in the 5S rRNA. The field of experimentally-directed RNA secondary structure prediction is entering a phase focused on the most difficult prediction challenges. We outline five constructive principles for guiding this field forward.
Retroviral genomes are dimeric, comprised of two sense-strand RNAs linked at their 5 ends by noncovalent base pairing and tertiary interactions. Viral maturation involves large-scale morphological changes in viral proteins and in genomic RNA dimer structures to yield infectious virions. Structural studies have largely focused on simplified in vitro models of genomic RNA dimers even though the relationship between these models and authentic viral RNA is unknown. We evaluate the secondary structure of the minimal dimerization domain in genomes isolated from Moloney murine leukemia virions using a quantitative and single nucleotide resolution RNA structure analysis technology (selective 2-hydroxyl acylation analyzed by primer extension, or SHAPE). Results are consistent with an architecture in which the RNA dimer is stabilized by four primary interactions involving two sets of intermolecular base pairs and two loop-loop interactions. The dimerization domain can independently direct its own folding since heating and refolding reproduce the same structure as visualized in genomic RNA isolated from virions. Authentic ex virio RNA has a SHAPE reactivity profile similar to that of a simplified transcript dimer generated in vitro, with the important exception of a region that appears to form a compact stem-loop only in the virion-isolated RNA. Finally, we analyze the conformational changes that accompany folding of monomers into dimers in vitro. These experiments support well-defined structural models for an authentic dimerization domain and also emphasize that many features of mature genomic RNA dimers can be reproduced in vitro using properly designed, simplified RNAs.Retroviruses, including both simple model viruses and complex viruses like human immunodeficiency virus (HIV), contain genomes in the form of two coding RNA strands, noncovalently linked at their 5Ј ends (11,19,20,26,38,45). This 5Ј linkage is termed the genomic RNA dimer. Packaging of RNA genomes into new virions is highly specific, even in the presence of a large background of cellular RNA (1,7,26). This packaging function is carried out by the Gag protein (14,30,43), which recognizes RNA sequences that overlap with the RNA dimerization domain (16,26,42,44). The specific Gagdimer interaction represents an elegant and direct mechanism by which exactly two RNA genomes are packaged into each nascent virion.The genomic RNA dimer is initially assembled into an immature and noninfectious viral particle (9,(18)(19)(20). After the immature particle buds from the host cell, it undergoes extensive morphological changes to form the mature and infectious virion (21, 51). Maturation is initiated through cleavage of the Gag polyprotein by the viral protease to yield smaller Gagderived proteins and also involves changes in the structure of the RNA dimer region. The RNA dimer structure appears to be more compact and, for many retroviruses, more thermostable in mature than in immature virions (18)(19)(20)26).The closely related Moloney murine leukemia and sarcoma viruse...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.