RNA structures possess multiple levels of structural organization. A secondary structure, made of Watson–Crick helices connected by loops, forms a scaffold for the tertiary structure. The 3D structures adopted by these loops are therefore critical determinants shaping the global 3D architecture. Earlier studies showed that these local 3D structures can be described as conserved sets of ordered non-Watson–Crick base pairs called RNA structural modules. Unfortunately, the computational efficiency and scope of the current 3D module identification methods are too limited yet to benefit from all the knowledge accumulated in the module databases. We present , an automated, efficient and customizable tool for (i) building Bayesian networks representing RNA 3D modules and (ii) rapid identification of 3D modules in sequences. uses a flexible definition of RNA 3D modules that allows us to consider complex architectures such as multi-branched loops and features multiple algorithmic improvements. We benchmarked our methods using cross-validation techniques on 3409 RNA chains and show that achieves up to ∼70% identification accuracy on module positions and base pair interactions. can handle a broader range of motifs (versatility) and offers considerable running time improvements (efficiency), opening the door to a broad range of large-scale applications.
RNA tertiary structure is crucial to its many non-coding molecular functions. RNA architecture is shaped by its secondary structure composed of stems, stacked canonical base pairs, enclosing loops. While stems are precisely captured by free-energy models, loops composed of non-canonical base pairs are not. Nor are distant interactions linking together those secondary structure elements (SSEs). Databases of conserved 3D geometries (a.k.a. modules) not captured by energetic models are leveraged for structure prediction and design, but the computational complexity has limited their study to local elements, loops. Representing the RNA structure as a graph has recently allowed to expend this work to pairs of SSEs, uncovering a hierarchical organization of these 3D modules, at great computational cost. Systematically capturing recurrent patterns on a large scale is a main challenge in the study of RNA structures. In this paper, we present an efficient algorithm to compute maximal isomorphisms in edge colored graphs. We extend this algorithm to a framework well suited to identify RNA modules, and fast enough to considerably generalize previous approaches. To exhibit the versatility of our framework, we first reproduce results identifying all common modules spanning more than 2 SSEs, in a few hours instead of weeks. The efficiency of our new algorithm is demonstrated by computing the maximal modules between any pair of entire RNA in the non-redundant corpus of known RNA 3D structures. We observe that the biggest modules our method uncovers compose large shared sub-structure spanning hundreds of nucleotides and base pairs between the ribosomes of Thermus thermophilus, Escherichia Coli, and Pseudomonas aeruginosa.
RNA tertiary structure is crucial to its many non-coding molecular functions. RNA architecture is shaped by its secondary structure composed of stems, stacked canonical base pairs, enclosing loops. While stems are captured by free-energy models, loops composed of non-canonical base pairs are not. Nor are distant interactions linking together those secondary structure elements (SSEs). Databases of conserved 3D geometries not captured by energetic models are leveraged for structure prediction and design, but the computational complexity has limited their study to local elements, loops, and recently to those covering pairs of SSEs. Systematically capturing recurrent patterns on a large scale is a main challenge in the study of RNA structures. In this paper, to automatically capture this topological information, we present a new general and efficient algorithm that leverages the fact that we can assign a proper edge coloring to graphs representing such structures. This allows to generalize previous approaches and systematically find for the first time modules over more than 2 SSEs, while improving speed a hundredfold. We then proceed to extract all recurrent base pairs networks between any RNA tertiary structures in our non-redundant dataset. We observed occurrences that are over 36 different SSEs, between the 23S ribosomes of E. Coli and of Thermus thermophilus. In addition to detecting them, our method organizes them into a network according to the similarities of their structures. Relaxing constraints, as not differentiating between local and distant interactions, reduces the number of isolated component in the network of structures. This behaviour can be leveraged to study the emergence of those intricate structures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.