Identification of conserved regions of a set of RNA secondary structures is currently an open research problem when dealing with large RNA molecules such as ribosomal RNA. We designed and implemented a method for conservancy annotation of a set of RNA molecules using their secondary structures. The method first converts secondary structures into linear representations, which are then forwarded into multiple sequence alignment (MSA). The resulting secondary structure-based MSA is subsequently passed into a conservancy identification procedure which uses a sliding window technique to identify conserved position in the MSA and assign them a score based on the secondary structure content of the window. The algorithm can be used to rank overall conservancy of the structures, which generally denotes evolutionary distance, as well as to assign conservancy to individual bases to identify high-or lowconservancy regions. We tested the algorithm for correlation with evolutionary distance, where it matches the expectations. The method is freely available as a stand-alone tool implemented in the Python programming language.Key words: RNA secondary structure, conservation, multiple sequence alignment.
IntroductionRibonucleic acids (RNA) play an important role in many processes related to protein synthesis or regulation of genetic expression [1]. Function of an RNA molecule is determined by its structure, i.e. a shape that is formed by the molecule. There are several levels of description of this structure. The simplest, primary structure, expresses an RNA molecule as a linear sequence of nucleotides. Tertiary structure, on the other hand, describes the full three-dimensional structure of the molecule. The tertiary structure is a labeled set of 3D coordinates, thus representing the most informative model of an RNA molecule. However, obtaining the tertiary structure of a molecule is a complex problem and therefore, due to the progress in sequencing technology, for most of RNA molecules the sequence information is available while the corresponding three-dimensional structure is not. In between, secondary structure does not tell atom positions in three-dimensional space, but rather describes the pattern of hydrogen bonds between bases (nucleotides). Two bases that are part of a hydrogen bond are called base pairs. The base pairs are frequently organized into specific substructures such as helices, loops, junctions, hairpins, overhangs bulges or pseudoknots. These are called structural features or motifs and are considered to be the "building blocks" of RNA secondary structures. Secondary structure information is a relatively good approximation of the tertiary structure, since it shows which parts of the sequence get near each other in the resulting 3D fold. While the computational prediction of tertiary structure is currently not possible, fortunately, there
International Journal of Bioscience, Biochemistry and Bioinformatics
18Volume 6, Number 1, January 2016Large RNA Secondary Structure Conservation Annotation Using S...