SARS-CoV-2 is a betacoronavirus with a linear single-stranded, positive-sense RNA genome, whose outbreak caused the ongoing COVID-19 pandemic. The ability of coronaviruses to rapidly evolve, adapt, and cross species barriers makes the development of effective and durable therapeutic strategies a challenging and urgent need. As for other RNA viruses, genomic RNA structures are expected to play crucial roles in several steps of the coronavirus replication cycle. Despite this, only a handful of functionally-conserved coronavirus structural RNA elements have been identified to date. Here, we performed RNA structure probing to obtain single-base resolution secondary structure maps of the full SARS-CoV-2 coronavirus genome both in vitro and in living infected cells. Probing data recapitulate the previously described coronavirus RNA elements (5′ UTR and s2m), and reveal new structures. Of these, ∼10.2% show significant covariation among SARS-CoV-2 and other coronaviruses, hinting at their functionally-conserved role. Secondary structure-restrained 3D modeling of these segments further allowed for the identification of putative druggable pockets. In addition, we identify a set of single-stranded segments in vivo, showing high sequence conservation, suitable for the development of antisense oligonucleotide therapeutics. Collectively, our work lays the foundation for the development of innovative RNA-targeted therapeutic strategies to fight SARS-related infections.
The molecular architecture of protein-RNA interfaces are analyzed using a non-redundant dataset of 152 protein-RNA complexes. We find that an average protein-RNA interface is smaller than an average protein-DNA interface but larger than an average protein-protein interface. Among the different classes of protein-RNA complexes, interfaces with tRNA are the largest, while the interfaces with the single-stranded RNA are the smallest. Significantly, RNA contributes more to the interface area than its partner protein. Moreover, unlike protein-protein interfaces where the side chain contributes less to the interface area compared to the main chain, the main chain and side chain contributions flipped in protein-RNA interfaces. We find that the protein surface in contact with the RNA in protein-RNA complexes is better packed than that in contact with the DNA in protein-DNA complexes, but loosely packed than that in contact with the protein in protein-protein complexes. Shape complementarity and electrostatic potential are the two major factors that determine the specificity of the protein-RNA interaction. We find that the H-bond density at the protein-RNA interfaces is similar with that of protein-DNA interfaces but higher than the protein-protein interfaces. Unlike protein-DNA interfaces where the deoxyribose has little role in intermolecular H-bonds, due to the presence of an oxygen atom at the 2' position, the ribose in RNA plays significant role in protein-RNA H-bonds. We find that besides H-bonds, salt bridges and stacking interactions also play significant role in stabilizing protein-nucleic acids interfaces; however, their contribution at the protein-protein interfaces is insignificant.
We use evolutionary conservation derived from structure alignment of polypeptide sequences along with structural and physicochemical attributes of protein–RNA interfaces to probe the binding hot spots at protein–RNA recognition sites. We find that the degree of conservation varies across the RNA binding proteins; some evolve rapidly compared to others. Additionally, irrespective of the structural class of the complexes, residues at the RNA binding sites are evolutionary better conserved than those at the solvent exposed surfaces. For recognitions involving duplex RNA, residues interacting with the major groove are better conserved than those interacting with the minor groove. We identify multi-interface residues participating simultaneously in protein–protein and protein–RNA interfaces in complexes where more than one polypeptide is involved in RNA recognition, and show that they are better conserved compared to any other RNA binding residues. We find that the residues at water preservation site are better conserved than those at hydrated or at dehydrated sites. Finally, we develop a Random Forests model using structural and physicochemical attributes for predicting binding hot spots. The model accurately predicts 80% of the instances of experimental ΔΔG values in a particular class, and provides a stepping-stone towards the engineering of protein–RNA recognition sites with desired affinity.
BackgroundComputational models of RNA 3D structure often present various inaccuracies caused by simplifications used in structure prediction methods, such as template-based modeling or coarse-grained simulations. To obtain a high-quality model, the preliminary RNA structural model needs to be refined, taking into account atomic interactions. The goal of the refinement is not only to improve the local quality of the model but to bring it globally closer to the true structure.ResultsWe present QRNAS, a software tool for fine-grained refinement of nucleic acid structures, which is an extension of the AMBER simulation method with additional restraints. QRNAS is capable of handling RNA, DNA, chimeras, and hybrids thereof, and enables modeling of nucleic acids containing modified residues.ConclusionsWe demonstrate the ability of QRNAS to improve the quality of models generated with different methods. QRNAS was able to improve MolProbity scores of NMR structures, as well as of computational models generated in the course of the RNA-Puzzles experiment. The overall geometry improvement may be associated with increased model accuracy, especially on the level of correctly modeled base-pairs, but the systematic improvement of root mean square deviation to the reference structure should not be expected. The method has been integrated into a computational modeling workflow, enabling improved RNA 3D structure prediction.
SARS-CoV-2 is a betacoronavirus with a linear single-stranded, positive-sense RNA genome of ~30 kb, whose outbreak caused the still ongoing COVID-19 pandemic. The ability of coronaviruses to rapidly evolve, adapt, and cross species barriers makes the development of effective and durable therapeutic strategies a challenging and urgent need. As for other RNA viruses, genomic RNA structures are expected to play crucial roles in several steps of the coronavirus replication cycle. Despite this, only a handful of functionally conserved structural elements within coronavirus RNA genomes have been identified to date.Here, we performed RNA structure probing by SHAPE-MaP to obtain a single-base resolution secondary structure map of the full SARS-CoV-2 coronavirus genome. The SHAPE-MaP probing data recapitulate the previously described coronavirus RNA elements (5′ UTR, ribosomal frameshifting element, and 3′ UTR), and reveal new structures. Secondary structure-restrained 3D modeling of highly-structured regions across the SARS-CoV-2 genome allowed for the identification of several putative druggable pockets. Furthermore, ~8% of the identified structure elements show significant covariation among SARS-CoV-2 and other coronaviruses, hinting at their functionally-conserved role. In addition, we identify a set of persistently single-stranded regions having high sequence conservation, suitable for the development of antisense oligonucleotide therapeutics.Collectively, our work lays the foundation for the development of innovative RNA-targeted therapeutic strategies to fight SARS-related infections.
We have developed a nonredundant protein-RNA docking benchmark dataset, which is derived from the available bound and unbound structures in the Protein Data Bank involving polypeptide and nucleic acid chains. It consists of nine unbound-unbound cases where both the protein and the RNA are available in the free form. The other 36 cases are of unbound-bound type where only the protein is available in the free form. The conformational change upon complex formation is calculated by a distance matrix alignment method, and based on that, complexes are classified into rigid, semi-flexible, and full flexible. Although in the rigid body category, no significant conformational change accompanies complex formation, the fully flexible test cases show large domain movements, RNA base flips, etc. The benchmark covers four major groups of RNA, namely, t-RNA, ribosomal RNA, duplex RNA, and single-stranded RNA. We find that RNA is generally more flexible than the protein in the complexes, and the interface region is as flexible as the molecule as a whole. The structural diversity of the complexes in the benchmark set should provide a common ground for the development and comparison of the protein-RNA docking methods. The benchmark can be freely downloaded from the internet.
BackgroundMicroRNAs (miRNAs) are endogenous, noncoding, short RNAs directly involved in regulating gene expression at the post-transcriptional level. In spite of immense importance, limited information of P. vulgaris miRNAs and their expression patterns prompted us to identify new miRNAs in P. vulgaris by computational methods. Besides conventional approaches, we have used the simple sequence repeat (SSR) signatures as one of the prediction parameter. Moreover, for all other parameters including normalized Shannon entropy, normalized base pairing index and normalized base-pair distance, instead of taking a fixed cut-off value, we have used 99 % probability range derived from the available data.ResultsWe have identified 208 mature miRNAs in P. vulgaris belonging to 118 families, of which 201 are novel. 97 of the predicted miRNAs in P. vulgaris were validated with the sequencing data obtained from the small RNA sequencing of P. vulgaris. Randomly selected predicted miRNAs were also validated using qRT-PCR. A total of 1305 target sequences were identified for 130 predicted miRNAs. Using 80 % sequence identity cut-off, proteins coded by 563 targets were identified. The computational method developed in this study was also validated by predicting 229 miRNAs of A. thaliana and 462 miRNAs of G. max, of which 213 for A. thaliana and 397 for G. max are existing in miRBase 20.ConclusionsThere is no universal SSR that is conserved among all precursors of Viridiplantae, but conserved SSR exists within a miRNA family and is used as a signature in our prediction method. Prediction of known miRNAs of A. thaliana and G. max validates the accuracy of our method. Our findings will contribute to the present knowledge of miRNAs and their targets in P. vulgaris. This computational method can be applied to any species of Viridiplantae for the successful prediction of miRNAs and their targets.Electronic supplementary materialThe online version of this article (doi:10.1186/s12870-015-0516-3) contains supplementary material, which is available to authorized users.
We present an updated version of the protein-RNA docking benchmark, which we first published four years back. The non-redundant protein-RNA docking benchmark version 2.0 consists of 126 test cases, a threefold increase in number compared to its previous version. The present version consists of 21 unbound-unbound cases, of which, in 12 cases, the unbound RNAs are taken from another complex. It also consists of 95 unbound-bound cases where only the protein is available in the unbound state. Besides, we introduce 10 new bound-unbound cases where only the RNA is found in the unbound state. Based on the degree of conformational change of the interface residues upon complex formation the benchmark is classified into 72 rigid-body cases, 25 semiflexible cases and 19 full flexible cases. It also covers a wide range of conformational flexibility including small side chain movement to large domain swapping in protein structures as well as flipping and restacking in RNA bases. This benchmark should provide the docking community with more test cases for evaluating rigid-body as well as flexible docking algorithms. Besides, it will also facilitate the development of new algorithms that require large number of training set. The protein-RNA docking benchmark version 2.0 can be freely downloaded from http://www.csb.iitkgp.ernet.in/applications/PRDBv2. Proteins 2017; 85:256-267. © 2016 Wiley Periodicals, Inc.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.