Through alternative processing of pre-mRNAs, individual mammalian genes often produce multiple mRNA and protein isoforms that may have related, distinct or even opposing functions. Here we report an in-depth analysis of 15 diverse human tissue and cell line transcriptomes based on deep sequencing of cDNA fragments, yielding a digital inventory of gene and mRNA isoform expression. Analysis of mappings of sequence reads to exon-exon junctions indicated that 92-94% of human genes undergo alternative splicing (AS), ∼86% with a minor isoform frequency of 15% or more. Differences in isoform-specific read densities indicated that a majority of AS and of alternative cleavage and polyadenylation (APA) events vary between tissues, while variation between individuals was ∼2- to 3-fold less common. Extreme or ‘switch-like’ regulation of splicing between tissues was associated with increased sequence conservation in regulatory regions and with generation of full-length open reading frames. Patterns of AS and APA were strongly correlated across tissues, suggesting coordinated regulation of these processes, and sequence conservation of a subset of known regulatory motifs in both alternative introns and 3′ UTRs suggested common involvement of specific factors in tissue-level regulation of both splicing and polyadenylation.
DNA sequence information underpins genetic research, enabling discoveries of important biological or medical benefit. Sequencing projects have traditionally employed long (400–800 bp) reads, but the existence of reference sequences for the human and many other genomes makes it possible to develop new, fast approaches to re-sequencing, whereby shorter reads are compared to a reference to identify intra-species genetic variation. We report an approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost. Single molecules of DNA are attached to a flat surface, amplified in situ and used as templates for synthetic sequencing with fluorescent reversible terminator deoxyribonucleotides. Images of the surface are analysed to generate high quality sequence. We demonstrate application of this approach to human genome sequencing on flow-sorted X chromosomes and then scale the approach to determine the genome sequence of a male Yoruba from Ibadan, Nigeria. We build an accurate consensus sequence from >30x average depth of paired 35-base reads. We characterise four million SNPs and four hundred thousand structural variants, many of which are previously unknown. Our approach is effective for accurate, rapid and economical whole genome re-sequencing and many other biomedical applications.
In the last decade, genome-wide transcriptome analyses have been routinely used to monitor tissue-, disease- and cell type-specific gene expression, but it has been technically challenging to generate expression profiles from single cells. Here we describe a novel and robust mRNA-Seq protocol (Smart-Seq) that is applicable down to single cell levels. Compared with existing methods, Smart-Seq has improved read coverage across transcripts, which significantly enhances detailed analyses of alternative transcript isoforms and identification of SNPs. We have determined the sensitivity and quantitative accuracy of Smart-Seq for single-cell transcriptomics by evaluating it on total RNA dilution series. Applying Smart-Seq to circulating tumor cells from melanomas, we identified distinct gene expression patterns, including new candidate biomarkers for melanoma circulating tumor cells. Importantly, our protocol can easily be utilized for addressing fundamental biological problems requiring genome-wide transcriptome profiling in rare cells.
SummaryThe Tasmanian devil (Sarcophilus harrisii), the largest marsupial carnivore, is endangered due to a transmissible facial cancer spread by direct transfer of living cancer cells through biting. Here we describe the sequencing, assembly, and annotation of the Tasmanian devil genome and whole-genome sequences for two geographically distant subclones of the cancer. Genomic analysis suggests that the cancer first arose from a female Tasmanian devil and that the clone has subsequently genetically diverged during its spread across Tasmania. The devil cancer genome contains more than 17,000 somatic base substitution mutations and bears the imprint of a distinct mutational process. Genotyping of somatic mutations in 104 geographically and temporally distributed Tasmanian devil tumors reveals the pattern of evolution and spread of this parasitic clonal lineage, with evidence of a selective sweep in one geographical area and persistence of parallel lineages in other populations.PaperClip
Chronic lymphocytic leukemia is characterized by relapse after treatment and chemotherapy resistance. Similarly, in other malignancies leukemia cells accumulate mutations during growth, forming heterogeneous cell populations that are subject to Darwinian selection and may respond differentially to treatment. There is therefore a clinical need to monitor changes in the subclonal composition of cancers during disease progression. Here, we use whole-genome sequencing to track subclonal heterogeneity in 3 chronic lymphocytic leukemia patients subjected to repeated cycles of therapy. We reveal different somatic mutation profiles in each patient and use these to establish probable hierarchical patterns of subclonal evolution, to identify subclones that decline or expand over time, and to detect founder mutations. We show that clonal evolution patterns are heterogeneous in individual patients. We conclude that genome sequencing is a powerful and sensitive approach to monitor disease progression repeatedly at the molecular level. IntroductionDespite significant progress in the management of lymphomas and leukemias, relapse remains the major cause of death. Increased use of expensive targeted therapies and toxic chemotherapies (especially in the elderly) confronts us with an urgent need to improve response prediction for all cancer patients to reduce side effects and costs from ineffective treatment. Current diagnostic approaches to treatment selection, response monitoring, and relapse prediction are limited to single genes and apply only to a minority of hematologic cancers. This is at odds with modern concepts of tumor propagation and maintenance, which propose that every cell in an individual cancer is characterized by a combination of mutation events that comprise tumorigenic (driver) mutations, passive (passenger) mutations, and possibly predisposing germline risk variants. Cancer cells propagate and diversify during tumor growth, resulting in a heterogeneous population of genotypically and phenotypically distinct subclones that are related in a hierarchical lineage. As the composition of the local environment changes, for example as a consequence of drug treatment, tumor cell populations adapt and evolve by Darwinian selection. [1][2][3] Whole-genome sequencing (WGS) of a single tumor sample can be used to generate a comprehensive catalog of variants that provides a snapshot of the cell population en masse at a particular time point. 2,4-6 However, over time and with continued evolution of the cancer, this snapshot becomes progressively less representative of the disease. Recent reports have described whole-tumor genomes from single patients or cohorts of individuals mostly at single time points and irrespective of treatment. [7][8][9][10] This approach has enabled identification of mutations representative and in some cases highly predictive of histologic cancer type, outcome, and/or treatment response. [11][12][13][14][15] Comparison of sequence data from primary and metastatic tumor samples, or from multiple lo...
We have developed an enhanced form of reduced representation bisulfite sequencing with extended genomic coverage, which resulted in greater capture of DNA methylation information of regions lying outside of traditional CpG islands. Applying this method to primary human bone marrow specimens from patients with Acute Myelogeneous Leukemia (AML), we demonstrated that genetically distinct AML subtypes display diametrically opposed DNA methylation patterns. As compared to normal controls, we observed widespread hypermethylation in IDH mutant AMLs, preferentially targeting promoter regions and CpG islands neighboring the transcription start sites of genes. In contrast, AMLs harboring translocations affecting the MLL gene displayed extensive loss of methylation of an almost mutually exclusive set of CpGs, which instead affected introns and distal intergenic CpG islands and shores. When analyzed in conjunction with gene expression profiles, it became apparent that these specific patterns of DNA methylation result in differing roles in gene expression regulation. However, despite this subtype-specific DNA methylation patterning, a much smaller set of CpG sites are consistently affected in both AML subtypes. Most CpG sites in this common core of aberrantly methylated CpGs were hypermethylated in both AML subtypes. Therefore, aberrant DNA methylation patterns in AML do not occur in a stereotypical manner but rather are highly specific and associated with specific driving genetic lesions.
Monozygotic (MZ) or “identical” twins have been widely studied to dissect the relative contributions of genetics and environment in human diseases. In multiple sclerosis (MS), an autoimmune demyelinating disease and common cause of neurodegeneration and disability in young adults, disease discordance in MZ twins has been interpreted to indicate environmental importance in its pathogenesis1–8. However, genetic and epigenetic differences between MZ twins have been described, challenging the accepted experimental paradigm in disambiguating effects of nature and nurture.9–12 Here, we report the genome sequences of one MS-discordant MZ twin pair and messenger RNA (mRNA) transcriptome and epigenome sequences of CD4+ lymphocytes from three MS-discordant, MZ twin pairs. No reproducible differences were detected between co-twins among ~3.6 million single nucleotide polymorphisms (SNPs) or ~0.2 million insertion-deletion polymorphisms (indels). Nor were any reproducible differences observed between siblings of the three twin pairs in HLA haplotypes, confirmed MS-susceptibility SNPs, copy number variations, mRNA and genomic SNP and indel genotypes, or expression of ~19,000 genes in CD4+ T cells. Only two to 176 differences in methylation of ~2 million CpG dinucleotides were detected between siblings of the three twin pairs, in contrast to ~800 methylation differences between T cells of unrelated individuals and several thousand differences between tissues or normal and cancerous tissues. In the first systematic effort to estimate sequence variation among MZ co-twins, we did not find evidence for genetic, epigenetic or transcriptome differences that explained disease discordance. These are the first female, twin and autoimmune disease individual genome sequences reported.
Recurrent gene fusions are a prevalent class of mutations arising from the juxtaposition of 2 distinct regions, which can generate novel functional transcripts that could serve as valuable therapeutic targets in cancer. Therefore, we aim to establish a sensitive, high-throughput methodology to comprehensively catalog functional gene fusions in cancer by evaluating a paired-end transcriptome sequencing strategy. Not only did a paired-end approach provide a greater dynamic range in comparison with single read based approaches, but it clearly distinguished the high-level ''driving'' gene fusions, such as BCR-ABL1 and TMPRSS2-ERG, from potential lower level ''passenger'' gene fusions. Also, the comprehensiveness of a paired-end approach enabled the discovery of 12 previously undescribed gene fusions in 4 commonly used cell lines that eluded previous approaches. Using the paired-end transcriptome sequencing approach, we observed readthrough mRNA chimeras, tissue-type restricted chimeras, converging transcripts, diverging transcripts, and overlapping mRNA transcripts. Last, we successfully used paired-end transcriptome sequencing to detect previously undescribed ETS gene fusions in prostate tumors. Together, this study establishes a highly specific and sensitive approach for accurately and comprehensively cataloguing chimeras within a sample using paired-end transcriptome sequencing.bioinformatics ͉ gene fusions ͉ prostate cancer ͉ breast cancer ͉ RNA-Seq
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.