The functions of most long non-coding RNAs (lncRNAs) are unknown. In contrast to proteins, lncRNAs with similar functions often lack linear sequence homology; thus, the identification of function in one lncRNA rarely informs the identification of function in others. We developed a sequence comparison method to deconstruct linear sequence relationships in lncRNAs and evaluate similarity based on the abundance of short motifs called k-mers. We found that lncRNAs of related function often had similar k-mer profiles despite lacking linear homology, and that k-mer profiles correlated with protein binding to lncRNAs and with their subcellular localization. Using a novel assay to quantify Xist-like regulatory potential, we directly demonstrated that evolutionarily unrelated lncRNAs can encode similar function through different spatial arrangements of related sequence motifs. K-mer-based classification is a powerful approach to detect recurrent relationships between sequence and function in lncRNAs.
The marsupial inactive X chromosome expresses a long noncoding RNA (lncRNA) called Rsx that has been proposed to be the functional analog of eutherian Xist. Despite the possibility that Xist and Rsx encode related functions, the two lncRNAs harbor no linear sequence similarity. However, both lncRNAs harbor domains of tandemly repeated sequence. In Xist, these repeat domains are known to be critical for function. Using k-mer based comparison, we show that the repeat domains of Xist and Rsx unexpectedly partition into two major clusters that each harbor substantial levels of nonlinear sequence similarity. Xist Repeats B, C, and D were most similar to each other and to Rsx Repeat 1, whereas Xist Repeats A and E were most similar to each other and to Rsx Repeats 2, 3, and 4. Similarities at the level of k-mers corresponded to domain-specific enrichment of protein-binding motifs. Within individual domains, protein-binding motifs were often enriched to extreme levels. Our data support the hypothesis that Xist and Rsx encode similar functions through different spatial arrangements of functionally analogous protein-binding domains. We propose that the two clusters of repeat domains in Xist and Rsx function in part to cooperatively recruit PRC1 and PRC2 to chromatin. The physical manner in which these domains engage with protein cofactors may be just as critical to the function of the domains as the protein cofactors themselves. The general approaches we outline in this report should prove useful in the study of any set of RNAs.
In this work, we have examined contributions to the thermodynamics of calmodulin (CaM) binding from the intrinsic propensity for target peptides to adopt an α-helical conformation. CaM target sequences are thought to commonly reside in disordered regions within proteins. Using the ability of TFE to induce α-helical structure as a proxy, the six peptides studied range from having almost no propensity to adopt α-helical structure through to a very high propensity. This despite all six peptides having similar CaM-binding affinities. Our data indicate there is some correlation between the deduced propensities and the thermodynamics of CaM binding. This finding implies that molecular recognition features, such as CaM target sequences, may possess a broad range of propensities to adopt local structure. Given that these peptides bind to CaM with similar affinities, the data suggest that having a higher propensity to adopt α-helical structure does not necessarily result in tighter binding, and that the mechanism of CaM binding is very dependent on the nature of the substrate sequence.
The marsupial inactive X chromosome expresses a long noncoding RNA (lncRNA) called Rsx that has been proposed to be the functional analogue of eutherian Xist. Despite the possibility that Xist and Rsx encode related functions, the two lncRNAs harbor no linear sequence similarity.However, both lncRNAs harbor domains of tandemly repeated sequence. In Xist, these repeat domains are known to be critical for function. Using k-mer based comparison, we show that the repeat domains of Xist and Rsx unexpectedly partition into two major clusters that each harbor substantial levels of non-linear sequence similarity. Xist Repeats B, C and D were most similar to each other and to Rsx Repeat 1, whereas Xist Repeats A and E were most similar to each other and to Rsx Repeats 2, 3, and 4. Similarities at the level of k-mers corresponded to domain-specific enrichment of protein-binding motifs. Within individual domains, protein-binding motifs were often enriched to extreme levels. Our data support the hypothesis that Xist and Rsx encode similar functions through different spatial arrangements of functionally analogous protein-binding domains. We propose that the two clusters of repeat domains in Xist and Rsx function in part to cooperatively recruit PRC1 and PRC2 to chromatin. The physical manner in which these domains engage with protein cofactors may be just as critical to the function of the domains as the protein cofactors themselves. The general approaches we outline in this report should prove useful in the study of any set of RNAs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.