Concerted examination of multiple collections of single-cell RNA sequencing (RNA-seq) data promises further biological insights that cannot be uncovered with individual datasets. Here we present scMerge, an algorithm that integrates multiple single-cell RNA-seq datasets using factor analysis of stably expressed genes and pseudoreplicates across datasets. Using a large collection of public datasets, we benchmark scMerge against published methods and demonstrate that it consistently provides improved cell type separation by removing unwanted factors; scMerge can also enhance biological discovery through robust data integration, which we show through the inference of development trajectory in a liver dataset collection.
Human genetic factors predispose to tuberculosis (TB). We studied 7.6 million genetic variants in 5,530 pulmonary TB patients and 5,607 healthy controls. In the combined analysis of these subjects and the follow-up cohort (15,087 TB patients and controls altogether), we found association between TB and variants located in introns of the ASAP1 gene on chromosome 8q24 (P = 2.6 × 10−11 for rs4733781; P = 1.0 × 10−10 for rs10956514). Dendritic cells (DCs) showed high level of ASAP1 expression, which was reduced after M. tuberculosis infection, and rs10956514 was associated with the level of reduction of ASAP1 expression. The ASAP1 protein is involved in actin and membrane remodeling and has been associated with podosomes. The ASAP1-depleted DCs showed impaired matrix degradation and migration. Therefore, genetically determined excessive reduction of ASAP1 expression in M. tuberculosis-infected DCs may lead to their impaired migration, suggesting a potential novel mechanism that predisposes to TB.
TDP‐43 (encoded by the gene TARDBP) is an RNA binding protein central to the pathogenesis of amyotrophic lateral sclerosis (ALS). However, how TARDBP mutations trigger pathogenesis remains unknown. Here, we use novel mouse mutants carrying point mutations in endogenous Tardbp to dissect TDP‐43 function at physiological levels both in vitro and in vivo. Interestingly, we find that mutations within the C‐terminal domain of TDP‐43 lead to a gain of splicing function. Using two different strains, we are able to separate TDP‐43 loss‐ and gain‐of‐function effects. TDP‐43 gain‐of‐function effects in these mice reveal a novel category of splicing events controlled by TDP‐43, referred to as “skiptic” exons, in which skipping of constitutive exons causes changes in gene expression. In vivo, this gain‐of‐function mutation in endogenous Tardbp causes an adult‐onset neuromuscular phenotype accompanied by motor neuron loss and neurodegenerative changes. Furthermore, we have validated the splicing gain‐of‐function and skiptic exons in ALS patient‐derived cells. Our findings provide a novel pathogenic mechanism and highlight how TDP‐43 gain of function and loss of function affect RNA processing differently, suggesting they may act at different disease stages.
Devoy et al. develop the first mouse model to fully recapitulate human FUS-ALS, as defined by midlife-onset progressive degeneration of motor neurons with dominant inheritance. A toxic gain of function occurs in the absence of FUS protein aggregation, involving disturbance of ribosomes and mitochondria at the endoplasmic reticulum.
The mismatch repair gene MSH3 has been implicated as a genetic modifier of the CAG·CTG repeat expansion disorders Huntington’s disease and myotonic dystrophy type 1. A recent Huntington’s disease genome-wide association study found rs557874766, an imputed single nucleotide polymorphism located within a polymorphic 9 bp tandem repeat in MSH3/DHFR, as the variant most significantly associated with progression in Huntington’s disease. Using Illumina sequencing in Huntington’s disease and myotonic dystrophy type 1 subjects, we show that rs557874766 is an alignment artefact, the minor allele for which corresponds to a three-repeat allele in MSH3 exon 1 that is associated with a reduced rate of somatic CAG·CTG expansion (P = 0.004) and delayed disease onset (P = 0.003) in both Huntington’s disease and myotonic dystrophy type 1, and slower progression (P = 3.86 × 10−7) in Huntington’s disease. RNA-Seq of whole blood in the Huntington’s disease subjects found that repeat variants are associated with MSH3 and DHFR expression. A transcriptome-wide association study in the Huntington’s disease cohort found increased MSH3 and DHFR expression are associated with disease progression. These results suggest that variation in the MSH3 exon 1 repeat region influences somatic expansion and disease phenotype in Huntington’s disease and myotonic dystrophy type 1, and suggests a common DNA repair mechanism operates in both repeat expansion diseases.
SummaryTranscriptional analysis of brain tissue from people with molecularly defined causes of obesity may highlight disease mechanisms and therapeutic targets. We performed RNA sequencing of hypothalamus from individuals with Prader-Willi syndrome (PWS), a genetic obesity syndrome characterized by severe hyperphagia. We found that upregulated genes overlap with the transcriptome of mouse Agrp neurons that signal hunger, while downregulated genes overlap with the expression profile of Pomc neurons activated by feeding. Downregulated genes are expressed mainly in neuronal cells and contribute to neurogenesis, neurotransmitter release, and synaptic plasticity, while upregulated, predominantly microglial genes are involved in inflammatory responses. This transcriptional signature may be mediated by reduced brain-derived neurotrophic factor expression. Additionally, we implicate disruption of alternative splicing as a potential molecular mechanism underlying neuronal dysfunction in PWS. Transcriptomic analysis of the human hypothalamus may identify neural mechanisms involved in energy homeostasis and potential therapeutic targets for weight loss.
High-throughput single-cell RNA-seq (scRNA-seq) is a powerful tool for studying gene expression in single cells. Most current scRNA-seq bioinformatics tools focus on analysing overall expression levels, largely ignoring alternative mRNA isoform expression. We present a computational pipeline, Sierra, that readily detects differential transcript usage from data generated by commonly used polyA-captured scRNA-seq technology. We validate Sierra by comparing cardiac scRNA-seq cell types to bulk RNA-seq of matched populations, finding significant overlap in differential transcripts. Sierra detects differential transcript usage across human peripheral blood mononuclear cells and the Tabula Muris, and 3 UTR shortening in cardiac fibroblasts. Sierra is available at https://github.com/VCCRI/Sierra.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.