The RNA World Hypothesis suggests that prebiotic life revolved around RNA instead of DNA and proteins. Although modern cells have changed significantly in 4 billion years, RNA has maintained its central role in cell biology. Since the discovery of DNA at the end of the nineteenth century, RNA has been extensively studied. Many discoveries such as housekeeping RNAs (rRNA, tRNA, etc.) supported the messenger RNA model that is the pillar of the central dogma of molecular biology, which was first devised in the late 1950s. Thirty years later, the first regulatory non-coding RNAs (ncRNAs) were initially identified in bacteria and then in most eukaryotic organisms. A few long ncRNAs (lncRNAs) such as H19 and Xist were characterized in the pre-genomic era but remained exceptions until the early 2000s. Indeed, when the sequence of the human genome was published in 2001, studies showed that only about 1.2% encodes proteins, the rest being deemed "non-coding." It was later shown that the genome is pervasively transcribed into many ncRNAs, but their functionality remained controversial. Since then, regulatory lncRNAs have been characterized in many species and were shown to be involved in processes such as development and pathologies, revealing a new layer of regulation in eukaryotic cells. This newly found focus on lncRNAs, together with the advent of high-throughput sequencing, was accompanied by the rapid discovery of many novel transcripts which were further characterized and classified according to specific transcript traits.In this review, we will discuss the many discoveries that led to the study of lncRNAs, from Friedrich Miescher's "nuclein" in 1869 to the elucidation of the human genome and transcriptome in the early 2000s. We will then focus on the biological relevance during lncRNA evolution and describe their basic features as genes and transcripts. Finally, we will present a non-exhaustive catalogue of lncRNA classes, thus illustrating the vast complexity of eukaryotic transcriptomes.
Single-nuclei RNA sequencing characterizes cell types at the gene level. However, compared to single-cell approaches, many single-nuclei cDNAs are purely intronic, lack barcodes and hinder the study of isoforms. Here we present single-nuclei isoform RNA sequencing (SnISOr-Seq). Using microfluidics, PCR-based artifact removal, target enrichment and long-read sequencing, SnISOr-Seq increased barcoded, exon-spanning long reads 7.5-fold compared to naive long-read single-nuclei sequencing. We applied SnISOr-Seq to adult human frontal cortex and found that exons associated with autism exhibit coordinated and highly cell-type-specific inclusion. We found two distinct combination patterns: those distinguishing neural cell types, enriched in TSS-exon, exon-polyadenylation-site and non-adjacent exon pairs, and those with multiple configurations within one cell type, enriched in adjacent exon pairs. Finally, we observed that human-specific exons are almost as tightly coordinated as conserved exons, implying that coordination can be rapidly established during evolution. SnISOr-Seq enables cell-type-specific long-read isoform analysis in human brain and in any frozen or hard-to-dissociate sample.
Annotating newly sequenced genomes and determining alternative isoforms from long-read RNA data are complex and incompletely solved problems. Here we present IsoQuant—a computational tool using intron graphs that accurately reconstructs transcripts both with and without reference genome annotation. For novel transcript discovery, IsoQuant reduces the false-positive rate fivefold and 2.5-fold for Oxford Nanopore reference-based or reference-free mode, respectively. IsoQuant also improves performance for Pacific Biosciences data.
Epithelial-to-mesenchymal transition (EMT) describes the loss of epithelial traits and gain of mesenchymal traits by normal cells during development and by neoplastic cells during cancer metastasis. The long noncoding RNA HOTAIR triggers EMT, in part by serving as a scaffold for PRC2 and thus promoting repressive histone H3K27 methylation. In addition to PRC2, HOTAIR interacts with the LSD1 lysine demethylase, an epigenetic regulator of cell fate during development and differentiation, but little is known about the role of LSD1 in HOTAIR function during EMT. Here, we show that HOTAIR requires its LSD1-interacting domain, but not its PRC2-interacting domain, to promote the migration of epithelial cells. This activity is suppressed by LSD1 overexpression. LSD1-HOTAIR interactions induce partial reprogramming of the epithelial transcriptome altering LSD1 distribution at promoter and enhancer regions. Thus, we uncover an unexpected role of HOTAIR in EMT as an LSD1 decommissioning factor, counteracting its activity in the control of epithelial identity.
Long reads are reshaping RNA biology. However, determining alternative isoforms from long-read RNA data is a complex and incompletely solved problem even when the reference genome is known. Here we present IsoQuant - a reference-based tool that accurately discovers novel transcripts with at least 3-fold lower false positive rate and 1.8-fold increase in F1-score compared to other tools for Oxford Nanopore data. IsoQuant also increases performance for Pacific Biosciences data.
The profiling of gene expression patterns to glean biological insights from single cells has become commonplace over the last few years. However, this approach overlooks the transcript contents that can differ between individual cells and cell populations. In this review, we describe early work in the field of single-cell short-read sequencing as well as full-length isoforms from single cells. We then describe recent work in single-cell long-read sequencing wherein some transcript elements have been observed to work in tandem. Based on earlier work in bulk tissue, we motivate the study of combination patterns of other RNA variables. Given that we are still blind to some aspects of isoform biology, we suggest possible future avenues such as CRISPR screens which can further illuminate the function of RNA variables in distinct cell populations.
Barcoding strategies are fundamental to droplet-based single-cell sequencing, and understanding the biases and caveats between approaches is essential. Here, we comprehensively evaluated both short and long reads of the cDNA obtained through the two marketed approaches from 10x Genomics, the "3' assay" and the "5' assay", which attach barcodes at different ends of the mRNA molecule. Although the barcode detection, cell-type identification, and gene expression profile are similar in both assays, the 5' assay captured more exonic molecules and fewer intronic molecules compared to the 3' assay. We found that 13.7% of genes sequenced have longer average read lengths and are more complete (spanning both polyA-site and TSS) in the long reads from the 5' assay compared to the 3' assay. These genes are characterized by long average transcript length, high intron number, and low expression overall. Despite these differences, cell-type-specific isoform profiles observed from the two assays remain highly correlated. This study provides a benchmark for choosing the single-cell assay for the intended research question, and insights regarding platform-specific biases to be mindful of when analyzing data, particularly across samples and technologies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.