The evolution of CRISPR–cas loci, which encode adaptive immune systems in archaea and bacteria, involves rapid changes, in particular numerous rearrangements of the locus architecture and horizontal transfer of complete loci or individual modules. These dynamics complicate straightforward phylogenetic classification, but here we present an approach combining the analysis of signature protein families and features of the architecture of cas loci that unambiguously partitions most CRISPR–cas loci into distinct classes, types and subtypes. The new classification retains the overall structure of the previous version but is expanded to now encompass two classes, five types and 16 subtypes. The relative stability of the classification suggests that the most prevalent variants of CRISPR–Cas systems are already known. However, the existence of rare, currently unclassifiable variants implies that additional types and subtypes remain to be characterized.
We present GraphProt, a computational framework for learning sequence- and structure-binding preferences of RNA-binding proteins (RBPs) from high-throughput experimental data. We benchmark GraphProt, demonstrating that the modeled binding preferences conform to the literature, and showcase the biological relevance and two applications of GraphProt models. First, estimated binding affinities correlate with experimental measurements. Second, predicted Ago2 targets display higher levels of expression upon Ago2 knockdown, whereas control targets do not. Computational binding models, such as those provided by GraphProt, are essential for predicting RBP binding sites and affinities in all tissues. GraphProt is freely available at http://www.bioinf.uni-freiburg.de/Software/GraphProt.
The Freiburg RNA tools webserver is a well established online resource for RNA-focused research. It provides a unified user interface and comprehensive result visualization for efficient command line tools. The webserver includes RNA-RNA interaction prediction (IntaRNA, CopraRNA, metaMIR), sRNA homology search (GLASSgo), sequence-structure alignments (LocARNA, MARNA, CARNA, ExpaRNA), CRISPR repeat classification (CRISPRmap), sequence design (antaRNA, INFO-RNA, SECISDesign), structure aberration evaluation of point mutations (RaSE), and RNA/protein-family models visualization (CMV), and other methods. Open education resources offer interactive visualizations of RNA structure and RNA-RNA interaction prediction as well as basic and advanced sequence alignment algorithms. The services are freely available at http://rna.informatik.uni-freiburg.de.
Tissue-specific alternative splicing is critical for the emergence of tissue identity during development, yet the role of this process in malignant transformation is undefined. Tissue-specific splicing involves evolutionarily conserved, alternative exons that represent only a minority of the total alternative exons identified. Many of these conserved exons have functional features that influence signaling pathways to profound biological effect. Here, we determined that lineage-specific splicing of a brain-enriched cassette exon in the membrane-binding tumor suppressor annexin A7 (ANXA7) diminishes endosomal targeting of the EGFR oncoprotein, consequently enhancing EGFR signaling during brain tumor progression. ANXA7 exon splicing was mediated by the ribonucleoprotein PTBP1, which is normally repressed during neuronal development. PTBP1 was highly expressed in glioblastomas due to loss of a brain-enriched microRNA (miR-124) and to PTBP1 amplification. The alternative ANXA7 splicing trait was present in precursor cells, suggesting that glioblastoma cells inherit the trait from a potential tumor-initiating ancestor and that these cells exploit this trait through accumulation of mutations that enhance EGFR signaling. Our data illustrate that lineage-specific splicing of a tissue-regulated alternative exon in a constituent of an oncogenic pathway eliminates tumor suppressor functions and promotes glioblastoma progression. This paradigm may offer a general model as to how tissue-specific regulatory mechanisms can reprogram normal developmental processes into oncogenic ones.
Motivation: Clustering according to sequence–structure similarity has now become a generally accepted scheme for ncRNA annotation. Its application to complete genomic sequences as well as whole transcriptomes is therefore desirable but hindered by extremely high computational costs.Results: We present a novel linear-time, alignment-free method for comparing and clustering RNAs according to sequence and structure. The approach scales to datasets of hundreds of thousands of sequences. The quality of the retrieved clusters has been benchmarked against known ncRNA datasets and is comparable to state-of-the-art sequence–structure methods although achieving speedups of several orders of magnitude. A selection of applications aiming at the detection of novel structural ncRNAs are presented. Exemplarily, we predicted local structural elements specific to lincRNAs likely functionally associating involved transcripts to vital processes of the human nervous system. In total, we predicted 349 local structural RNA elements.Availability: The pipeline is available on request.Contact: backofen@informatik.uni-freiburg.deSupplementary information: Supplementary data are available at Bioinformatics online.
Motivation: The discovery of CRISPR-Cas systems almost 20 years ago rapidly changed our perception of the bacterial and archaeal immune systems. CRISPR loci consist of several repetitive DNA sequences called repeats, inter-spaced by stretches of variable length sequences called spacers. This CRISPR array is transcribed and processed into multiple mature RNA species (crRNAs). A single crRNA is integrated into an interference complex, together with CRISPR-associated (Cas) proteins, to bind and degrade invading nucleic acids. Although existing bioinformatics tools can recognize CRISPR loci by their characteristic repeat-spacer architecture, they generally output CRISPR arrays of ambiguous orientation and thus do not determine the strand from which crRNAs are processed. Knowledge of the correct orientation is crucial for many tasks, including the classification of CRISPR conservation, the detection of leader regions, the identification of target sites (protospacers) on invading genetic elements and the characterization of protospacer-adjacent motifs.Results: We present a fast and accurate tool to determine the crRNA-encoding strand at CRISPR loci by predicting the correct orientation of repeats based on an advanced machine learning approach. Both the repeat sequence and mutation information were encoded and processed by an efficient graph kernel to learn higher-order correlations. The model was trained and tested on curated data comprising >4500 CRISPRs and yielded a remarkable performance of 0.95 AUC ROC (area under the curve of the receiver operator characteristic). In addition, we show that accurate orientation information greatly improved detection of conserved repeat sequence families and structure motifs. We integrated CRISPRstrand predictions into our CRISPRmap web server of CRISPR conservation and updated the latter to version 2.0.Availability: CRISPRmap and CRISPRstrand are available at http://rna.informatik.uni-freiburg.de/CRISPRmap.Contact: backofen@informatik.uni-freiburg.deSupplementary information: Supplementary data are available at Bioinformatics online.
Supplementary data are available at Bioinformatics online.
We propose a simple yet effective strategy to induce a task dependent feature representation using ensembles of random decision trees. The new feature mapping is efficient in space and time, and provides a metric transformation that is non parametric and not implicit in nature (i.e. not expressed via a kernel matrix), nor limited to the transductive setup. The main advantage of the proposed mapping lies in its flexibility to adapt to several types of learning tasks ranging from regression to multi-label classification, and to deal in a natural way with missing values. Finally, we provide an extensive empirical study of the properties of the learned feature representation over real and artificial datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.