A protocol is described for sequencing the transcriptome of a cell nucleus. Nuclei are isolated from specimens and sorted by FACS, cDNA libraries are constructed and RNA-seq is performed, followed by data analysis. Some steps follow published methods (Smart-seq2 for cDNA synthesis and Nextera XT barcoded library preparation) and are not described in detail here. Previous single-cell approaches for RNA-seq from tissues include cell dissociation using protease treatment at 30 °C, which is known to alter the transcriptome. We isolate nuclei at 4 °C from tissue homogenates, which cause minimal damage. Nuclear transcriptomes can be obtained from postmortem human brain tissue stored at −80 °C, making brain archives accessible for RNA-seq from individual neurons. The method also allows investigation of biological features unique to nuclei, such as enrichment of certain transcripts and precursors of some noncoding RNAs. By following this procedure, it takes about 4 d to construct cDNA libraries that are ready for sequencing.
There is concern that the stresses of inducing pluripotency may lead to deleterious DNA mutations in induced pluripotent stem cell (iPSC) lines, which would compromise their use for cell therapies. Here we report comparative genomic analysis of nine isogenic iPSC lines generated using three reprogramming methods: integrating retroviral vectors, non-integrating Sendai virus and synthetic mRNAs. We used whole-genome sequencing and de novo genome mapping to identify single-nucleotide variants, insertions and deletions, and structural variants. Our results show a moderate number of variants in the iPSCs that were not evident in the parental fibroblasts, which may result from reprogramming. There were only small differences in the total numbers and types of variants among different reprogramming methods. Most importantly, a thorough genomic analysis showed that the variants were generally benign. We conclude that the process of reprogramming is unlikely to introduce variants that would make the cells inappropriate for therapy.
Detection of somatic variation using sequence from disease-control matched data sets is a critical first step. In many cases including cancer, however, it is hard to isolate pure disease tissue, and the impurity hinders accurate mutation analysis by disrupting overall allele frequencies. Here, we propose a new method, Virmid, that explicitly determines the level of impurity in the sample, and uses it for improved detection of somatic variation. Extensive tests on simulated and real sequencing data from breast cancer and hemimegalencephaly demonstrate the power of our model. A software implementation of our method is available at http://sourceforge.net/projects/virmid/.
Sperm are haploid, but must be functionally equivalent to distribute alleles equally among progeny. Accordingly, gene products are shared through spermatid cytoplasmic bridges which erase phenotypic differences between individual haploid sperm. Here, we show that a large class of mammalian genes are not completely shared across these bridges. We term these genes “genoinformative markers” (GIMs) and show that a subset can act as selfish genetic elements that spread alleles unevenly through murine, bovine, and human populations. We identify evolutionary pressure to avoid conflict between sperm and somatic function as GIMs are enriched for testis-specific gene expression, paralogs, and isoforms. Therefore, GIMs and sperm-level natural selection may help explain why testis gene expression patterns are an outlier relative to all other tissues.
13 USA 14 I. ABSTRACT 1Transcriptome-wide association studies (TWAS) have proven to be a powerful tool to identify genes associated 2 with human diseases by aggregating cis-regulatory effects on gene expression. However, TWAS relies on 3 building predictive models of gene expression, which are sensitive to the sample size and tissue on which 4 they are trained. The Gene Tissue Expression Project has produced reference transcriptomes across 53 5 human tissues and cell types; however, the data is highly sparse, making it difficult to build polygenic 6 models in relevant tissues for TWAS. Here, we propose fQTL, a multi-tissue, multivariate model for mapping 7 expression quantitative trait loci and predicting gene expression. Our model decomposes eQTL effects 8 into SNP-specific and tissue-specific components, pooling information across relevant tissues to effectively 9 boost sample sizes. In simulation, we demonstrate that our multi-tissue approach outperforms single-tissue 10 approaches in identifying causal eQTLs and tissues of action. Using our method, we fit polygenic models 11 for 13,461 genes, characterized the tissue-specificity of the learned cis-eQTLs, and performed TWAS for 12 Alzheimer's disease and schizophrenia, identifying 107 and 382 associated genes, respectively. 13 II. INTRODUCTION 14A fundamental barrier to interpreting the role of non-coding genetic variation identified by genome-wide 15 association studies (GWAS) is understanding how specific single nucleotide polymorphisms (SNPs) cause 16 changes in gene expression, and how those changes in expression lead to downstream phenotypes. Recently, 17 transcriptome wide association studies (TWAS) have proven to be a powerful tool to predict the impact 18 of cis-regulatory regions on expression and directly associate genes with downstream disease phenotypes 1,2 . 19The key idea of TWAS is to train multivariate expression quantitative trait loci (eQTL) models on reference 20 expression panels, use these models to predict (impute) unobserved gene expression in large scale GWAS 21 cohorts, and compute association statistics by regressing phenotype directly onto imputed gene expression. 22The success of TWAS is dependent on the sample size of the expression reference panel and the tissue We define Θ jt to be the effect of SNP j on gene expression of that gene in tissue t. We define 54 σ(z) = 1/(1 + exp(−z)) and we set V max = V[y] and V min = 10 −4 V max . 55The key idea of fQTL is that we assume the eQTL effect size matrix Θ can be decomposed into tissue-invariant 56 components θ snp and tissue-dependent components θ tis :peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/107623 doi: bioRxiv preprint first posted online Feb. 10, 2017; 3 Here, we assume K = 1 for ease of interpreting the results, although our inference algorithm (described 58 below) supports fitting arbitrary K ≤ m. 59The fundamental problem in fitting a mul...
The degree to which germline variation drives cancer development and shapes tumor phenotypes remains largely unexplored, possibly due to a lack of large scale publicly available germline data for a cancer cohort. Here we called germline variants on 9,618 cases from The Cancer Genome Atlas (TCGA) database representing 31 cancer types. We identified batch effects affecting loss of function (LOF) variant calls that can be traced back to differences in the way the sequence data were generated both within and across cancer types. Overall, LOF indel calls were more sensitive to technical artifacts than LOF Single Nucleotide Variant (SNV) calls. In particular, whole genome amplification of DNA prior to sequencing led to an artificially increased burden of LOF indel calls, which confounded association analyses relating germline variants to tumor type despite stringent indel filtering strategies. Due to the inherent noise we chose to remove all 614 amplified DNA samples, including all acute myeloid leukemia and virtually all ovarian cancer samples, from the final dataset. This study demonstrates how insufficient quality control can lead to false positive germlinetumor type associations and draws attention to the need to be sensitive to problems associated with a lack of uniformity in data generation in TCGA data. There are substantial differences in the way exome sequence data was generated both across and within cancer types in TCGA. We observe that differences in sequence data generation introduced batch effects, or variation that is due to technical factors not true biological variation, in our variant data. Most notably, we observe that amplification of DNA prior to sequencing resulted in an excess of predicted damaging indel variants. We show how these batch effects can confound germline association analyses if not properly addressed. Our study highlights the difficulties of working with large public genomic datasets like TCGA where samples are collected over time and across data centers, and particularly cautions the use of amplified DNA samples for genetic association analyses.
BackgroundCancer research to date has largely focused on somatically acquired genetic aberrations. In contrast, the degree to which germline, or inherited, variation contributes to tumorigenesis remains unclear, possibly due to a lack of accessible germline variant data. Here we called germline variants on 9618 cases from The Cancer Genome Atlas (TCGA) database representing 31 cancer types.ResultsWe identified batch effects affecting loss of function (LOF) variant calls that can be traced back to differences in the way the sequence data were generated both within and across cancer types. Overall, LOF indel calls were more sensitive to technical artifacts than LOF Single Nucleotide Variant (SNV) calls. In particular, whole genome amplification of DNA prior to sequencing led to an artificially increased burden of LOF indel calls, which confounded association analyses relating germline variants to tumor type despite stringent indel filtering strategies. The samples affected by these technical artifacts include all acute myeloid leukemia and practically all ovarian cancer samples.ConclusionsWe demonstrate how technical artifacts induced by whole genome amplification of DNA can lead to false positive germline-tumor type associations and suggest TCGA whole genome amplified samples be used with caution. This study draws attention to the need to be sensitive to problems associated with a lack of uniformity in data generation in TCGA data.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-017-3770-y) contains supplementary material, which is available to authorized users.
11Mendel's first law dictates that alleles segregate randomly during meiosis and are dis-12 tributed to offspring with equal frequency, requiring sperm to be functionally independent 13 of their genetic payload. Developing mammalian spermatids have been thought to accom-14 plish this by freely sharing RNA from virtually all genes through cytoplasmic bridges, 15 equalizing allelic gene expression across different genotypes. Applying single cell RNA * Corresponding author: rfriedman@ohanabio.com sequencing to developing spermatids, we identify a large class of mammalian genes whose 17 allelic expression ratio is informative of the haploid genotype, which we call genoinforma-18 tive markers (GIMs). 29% of spermatid-expressed genes in mice and 47% in non-human 19 primates are not uniformly shared, and instead show a confident allelic expression bias 20 of at least 2-fold towards the haploid genotype. This property of GIMs was significantly 21 conserved between individuals and between rodents and primates. Consistent with the 22 interpretation of specific RNA localization resulting in incomplete sharing through cyto-23 plasmic bridges, we observe a strong depletion of GIM transcripts from chromatoid bodies, 24 structures involved in shuttling RNA across cytoplasmic bridges, and an enrichment for 25 3 UTR motifs involved in RNA localization. If GIMs are translated and functional in the 26 context of fertility, they would be able to violate Mendel's first law, leading to selective 27 sweeps through a population. Indeed, we show that GIMs are enriched for signatures of 28 positive selection, accounting for dozens of recent mouse, human, and primate selective 29 sweeps. Intense selection at the sperm level risks evolutionary conflict between germline 30 and somatic function, and GIMs show evidence of avoiding this conflict by exhibiting 31 more testis-specific gene expression, paralogs, and isoforms than expression-matched con-32 trol genes. The widespread existence of GIMs suggests that selective forces acting at the 33 level of individual mammalian sperm are much more frequent than commonly believed. 34 2 Author's summary 35Mendel's first law dictates that alleles are distributed to offspring with equal frequency, 36 requiring sperm carrying different genetics to be functionally equivalent. Despite a small 37 number of known exceptions to this, it is widely believed that sharing of gene products 38 through cytoplasmic bridges erases virtually all differences between haploid sperm. Here, 39 we show that a large class of mammalian genes are not completely shared across these 40 bridges, therefore causing sperm phenotype to correspond partly to haploid genotype. We 41 term these genes "genoinformative markers" (GIMs) and show that their identity tends 42 2 to be conserved from rodents to primates. Because some GIMs can link sperm genotype 43 to function, they can be thought of as selfish genetic elements which lead to natural se-44 lection between sperm rather than between organisms, a violation of Mendel's first ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.