Characterizing the transcriptome of individual cells is fundamental to understanding complex biological systems. We describe a droplet-based system that enables 3′ mRNA counting of tens of thousands of single cells per sample. Cell encapsulation, of up to 8 samples at a time, takes place in ∼6 min, with ∼50% cell capture efficiency. To demonstrate the system's technical performance, we collected transcriptome data from ∼250k single cells across 29 samples. We validated the sensitivity of the system and its ability to detect rare populations using cell lines and synthetic RNAs. We profiled 68k peripheral blood mononuclear cells to demonstrate the system's ability to characterize large immune populations. Finally, we used sequence variation in the transcriptome data to determine host and donor chimerism at single-cell resolution from bone marrow mononuclear cells isolated from transplant patients.
Genetic studies of human evolution require high-quality contiguous ape genome assemblies that are not guided by the human reference. We coupled long-read sequence assembly, full-length cDNA sequencing with a multi-platform scaffolding approach to produce ab initio chimpanzee and orangutan genome assemblies. Comparing these with two long-read de novo human genome assemblies and a gorilla genome assembly, we characterized lineage-specific and shared great ape genetic variation ranging from single base-pair to megabase-sized variants. We identified ~17 thousand fixed human-specific structural variants identifying genic and putative regulatory changes that emerged in humans since divergence from nonhuman apes. Interestingly, these fixed human-specific structural variants are enriched near genes that are downregulated in human compared to chimpanzee cerebral organoids, particularly in cells analogous to radial glial neural progenitors.
The Fox-1 protein regulates alternative splicing of tissuespecific exons by binding to GCAUG elements. Here, we report the solution structure of the Fox-1 RNA binding domain (RBD) in complex with UGCAUGU. The last three nucleotides, UGU, are recognized in a canonical way by the four-stranded b-sheet of the RBD. In contrast, the first four nucleotides, UGCA, are bound by two loops of the protein in an unprecedented manner. Nucleotides U 1 , G 2 , and C 3 are wrapped around a single phenylalanine, while G 2 and A 4 form a base-pair. This novel RNA binding site is independent from the b-sheet binding interface. Surface plasmon resonance analyses were used to quantify the energetic contributions of electrostatic and hydrogen bond interactions to complex formation and support our structural findings. These results demonstrate the unusual molecular mechanism of sequence-specific RNA recognition by Fox-1, which is exceptional in its high affinity for a defined but short sequence element.
Genes in prokaryotic genomes are often arranged into clusters and co-transcribed into polycistronic RNAs. Isolated examples of polycistronic RNAs were also reported in some higher eukaryotes but their presence was generally considered rare. Here we developed a long-read sequencing strategy to identify polycistronic transcripts in several mushroom forming fungal species including Plicaturopsis crispa, Phanerochaete chrysosporium, Trametes versicolor, and Gloeophyllum trabeum. We found genome-wide prevalence of polycistronic transcription in these Agaricomycetes, involving up to 8% of the transcribed genes. Unlike polycistronic mRNAs in prokaryotes, these co-transcribed genes are also independently transcribed. We show that polycistronic transcription may interfere with expression of the downstream tandem gene. Further comparative genomic analysis indicates that polycistronic transcription is conserved among a wide range of mushroom forming fungi. In summary, our study revealed, for the first time, the genome prevalence of polycistronic transcription in a phylogenetic range of higher fungi. Furthermore, we systematically show that our long-read sequencing approach and combined bioinformatics pipeline is a generic powerful tool for precise characterization of complex transcriptomes that enables identification of mRNA isoforms not recovered via short-read assembly.
Although transcriptional and posttranscriptional events are detected in RNA-Seq data from second-generation sequencing, fulllength mRNA isoforms are not captured. On the other hand, thirdgeneration sequencing, which yields much longer reads, has current limitations of lower raw accuracy and throughput. Here, we combine second-generation sequencing and third-generation sequencing with a custom-designed method for isoform identification and quantification to generate a high-confidence isoform dataset for human embryonic stem cells (hESCs). We report 8,084 RefSeq-annotated isoforms detected as full-length and an additional 5,459 isoforms predicted through statistical inference. Over one-third of these are novel isoforms, including 273 RNAs from gene loci that have not previously been identified. Further characterization of the novel loci indicates that a subset is expressed in pluripotent cells but not in diverse fetal and adult tissues; moreover, their reduced expression perturbs the network of pluripotency-associated genes. Results suggest that gene identification, even in well-characterized human cell lines and tissues, is likely far from complete.isoform discovery | PacBio | hESC transcriptome | alternative splicing | lncNRA
A vertebrate homologue of the Fox-1 protein from C. elegans was recently shown to bind to the element GCAUG and to act as an inhibitor of alternative splicing patterns in muscle. The element UGCAUG is a splicing enhancer element found downstream of numerous neuron-specific exons. We show here that mouse Fox-1 (mFox-1) and another homologue, Fox-2, are both specifically expressed in neurons in addition to muscle and heart. The mammalian Fox genes are very complex transcription units that generate transcripts from multiple promoters and with multiple internal exons whose inclusion is regulated. These genes produce a large family of proteins with variable N and C termini and internal deletions. We show that the overexpression of both Fox-1 and Fox-2 isoforms specifically activates splicing of neuronally regulated exons. This splicing activation requires UGCAUG enhancer elements. Conversely, RNA interference-mediated knockdown of Fox protein expression inhibits splicing of UGCAUG-dependent exons. These experiments show that this large family of proteins regulates splicing in the nervous system. They do this through a splicing enhancer function, in addition to their apparent negative effects on splicing in vertebrate muscle and in worms.Alternative splicing allows the production of multiple mRNAs from a single pre-mRNA via selection of different splice sites. Regulated exons are controlled by splicing enhancer and silencer elements within the exon or in the adjacent introns. These RNA sequences bind to specific regulatory proteins that contribute to the tissue specificity of splicing. Most exons are controlled by combinations of both positive and negative regulators, and how tissue specificity of splicing is achieved is poorly understood (5, 44).The N1 exon of the c-src gene serves as a model for an exon under both positive and negative control. In nonneuronal cells, the exon is repressed by the polypyrimidine tract binding protein (PTB) that binds to intronic splicing silencer elements flanking the N1 exon (1, 7, 9). In neurons, PTB-mediated repression is absent, and the exon is activated for splicing by an intronic splicing enhancer (4, 38). The enhancer region downstream of the N1 exon is complex, with binding sites for many proteins. However, the element most critical for enhancer activity is the sequence UGCAUG, which is flanked by PTB binding elements (4,37,38). Several proteins, including the hnRNPs F and H, the neuronal homologue of PTB, and the KH-type splicing regulatory protein, assemble onto this region in splicing extracts (8,30,34,35). Immunodepletion and antibody inhibition experiments have indicated a role for these proteins in the splicing of N1 in vitro. However, none of these proteins specifically recognizes the UGCAUG element, and they do not positively affect an exon controlled by just a UG CAUG element in vivo (J. G. Underwood and D. L. Black, unpublished observations). Thus, they do not seem to mediate the function of the strongest enhancer element. Their function may be related to preventing P...
The recent development of third generation sequencing (TGS) generates much longer reads than second generation sequencing (SGS) and thus provides a chance to solve problems that are difficult to study through SGS alone. However, higher raw read error rates are an intrinsic drawback in most TGS technologies. Here we present a computational method, LSC, to perform error correction of TGS long reads (LR) by SGS short reads (SR). Aiming to reduce the error rate in homopolymer runs in the main TGS platform, the PacBio® RS, LSC applies a homopolymer compression (HC) transformation strategy to increase the sensitivity of SR-LR alignment without scarifying alignment accuracy. We applied LSC to 100,000 PacBio long reads from human brain cerebellum RNA-seq data and 64 million single-end 75 bp reads from human brain RNA-seq data. The results show LSC can correct PacBio long reads to reduce the error rate by more than 3 folds. The improved accuracy greatly benefits many downstream analyses, such as directional gene isoform detection in RNA-seq study. Compared with another hybrid correction tool, LSC can achieve over double the sensitivity and similar specificity.
High-throughput RNA sequencing (RNA-seq) dramatically expands the potential for novel genomics discoveries, but the wide variety of platforms, protocols and performance has created the need for comprehensive reference data. Here we describe the Association of Biomolecular Resource Facilities next-generation sequencing (ABRF-NGS) study on RNA-seq. We tested replicate experiments across 15 laboratory sites using reference RNA standards to test four protocols (polyA-selected, ribo-depleted, size-selected and degraded) on five sequencing platforms (Illumina HiSeq, Life Technologies’ PGM and Proton, Pacific Biosciences RS and Roche’s 454). The results show high intra-platform and inter-platform concordance for expression measures across the deep-count platforms, but highly variable efficiency and cost for splice junction and variant detection between all platforms. These data also demonstrate that ribosomal RNA depletion can both enable effective analysis of degraded RNA samples and be readily compared to polyA-enriched fractions. This study provides a broad foundation for cross-platform standardization, evaluation and improvement of RNA-seq.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.