The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.
The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCodeTM WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.
Differentiated macrophages can self-renew in tissues and expand long-term in culture, but the gene regulatory mechanisms that accomplish self-renewal in the differentiated state have remained unknown. Here we show that in mice, the transcription factors MafB and c-Maf repress a macrophage-specific enhancer repertoire associated with a gene network controlling self-renewal. Single cell analysis revealed that, in vivo, proliferating resident macrophages can access this network by transient down-regulation of Maf transcription factors. The network also controls embryonic stem cell self-renewal but is associated with distinct embryonic stem cell-specific enhancers. This indicates that distinct lineage-specific enhancer platforms regulate a shared network of genes that control self-renewal potential in both stem and mature cells.
Although shotgun metagenomic sequencing of microbiome samples enables partial reconstruction of the strain-level community structure, it remains difficult to obtain high-quality microbial genome drafts without isolation and culture. Here we present a novel application of read clouds, short read sequences tagged with long-range information, to microbiome samples. We present Athena, a de novo assembler that uses read clouds to improve metagenomic assemblies. We apply this approach to sequence stool samples from two healthy individuals, and compare it to existing short-read and synthetic long-read metagenomic sequencing techniques. Read cloud metagenomic sequencing and Athena assembly produce the most complete individual genome drafts with high contiguity (>200 kbp N50, <10 contigs), even for bacteria that have relatively low (20x) raw short-read sequence coverage. We also sequence a complex marine sediment sample and generate 24 intermediate-quality genome drafts (>70% complete, <10% contaminated), nine of which are complete (>90% complete, <5% contaminated). Thus, our approach allows culture-free generation of high-quality microbial genome drafts using a single shotgun experiment.
Gene expression microarrays are the most widely used technique for genome-wide expression profiling. However, microarrays do not perform well on formalin fixed paraffin embedded tissue (FFPET). Consequently, microarrays cannot be effectively utilized to perform gene expression profiling on the vast majority of archival tumor samples. To address this limitation of gene expression microarrays, we designed a novel procedure (3′-end sequencing for expression quantification (3SEQ)) for gene expression profiling from FFPET using next-generation sequencing. We performed gene expression profiling by 3SEQ and microarray on both frozen tissue and FFPET from two soft tissue tumors (desmoid type fibromatosis (DTF) and solitary fibrous tumor (SFT)) (total n = 23 samples, which were each profiled by at least one of the four platform-tissue preparation combinations). Analysis of 3SEQ data revealed many genes differentially expressed between the tumor types (FDR<0.01) on both the frozen tissue (∼9.6K genes) and FFPET (∼8.1K genes). Analysis of microarray data from frozen tissue revealed fewer differentially expressed genes (∼4.64K), and analysis of microarray data on FFPET revealed very few (69) differentially expressed genes. Functional gene set analysis of 3SEQ data from both frozen tissue and FFPET identified biological pathways known to be important in DTF and SFT pathogenesis and suggested several additional candidate oncogenic pathways in these tumors. These findings demonstrate that 3SEQ is an effective technique for gene expression profiling from archival tumor samples and may facilitate significant advances in translational cancer research.
Cancer evolution involves cycles of genomic damage, epigenetic deregulation, and increased cellular proliferation that eventually culminate in the carcinoma phenotype. Early neoplasias, which are often found concurrently with carcinomas and are histologically distinguishable from normal breast tissue, are less advanced in phenotype than carcinomas and are thought to represent precursor stages. To elucidate their role in cancer evolution we performed comparative wholegenome sequencing of early neoplasias, matched normal tissue, and carcinomas from six patients, for a total of 31 samples. By using somatic mutations as lineage markers we built trees that relate the tissue samples within each patient. On the basis of these lineage trees we inferred the order, timing, and rates of genomic events. In four out of six cases, an early neoplasia and the carcinoma share a mutated common ancestor with recurring aneuploidies, and in all six cases evolution accelerated in the carcinoma lineage. Transition spectra of somatic mutations are stable and consistent across cases, suggesting that accumulation of somatic mutations is a result of increased ancestral cell division rather than specific mutational mechanisms. In contrast to highly advanced tumors that are the focus of much of the current cancer genome sequencing, neither the early neoplasia genomes nor the carcinomas are enriched with potentially functional somatic point mutations. Aneuploidies that occur in common ancestors of neoplastic and tumor cells are the earliest events that affect a large number of genes and may predispose breast tissue to eventual development of invasive carcinoma.[Supplemental material is available for this article.]
SUMMARY The genetic programs that maintain leukemia stem cell (LSC) self-renewal and oncogenic potential have been well defined, however the comprehensive epigenetic landscape that sustains LSC cellular identity and functionality is less well established. We report that LSCs in MLL-associated leukemia reside in an epigenetic state of relative genome-wide high-level H3K4me3 and low level H3K79me2. LSC differentiation is associated with reversal of these broad epigenetic profiles, with concomitant down-regulation of crucial MLL target genes and the LSC maintenance transcriptional program that is driven by loss of H3K4me3 but not H3K79me2. The H3K4-specific demethylase KDM5B negatively regulates leukemogenesis in murine and human MLL-rearranged AML cells, demonstrating a crucial role for the H3K4 global methylome in determining leukemia stem cell fate.
Microfluidic partitioning of long genomic DNA fragments, and barcoding of shorter fragments derived from them, retains long-range information in short sequencing reads. Such read cloud approaches represent a powerful and cost-effective alternative to single-molecule long-read sequencing. We developed GROC-SVs, which uses read clouds for structural variant detection and assembly, and apply it to Illumina-sequenced 10× Genomics sarcoma and breast cancer data sets. Validation demonstrates substantial improvement in specificity of breakpoint detection compared to short-fragment sequencing, at comparable sensitivity, and vice versa. The long-range information also facilitates sequence assembly of breakpoints; importantly, consecutive breakpoints closer than the average length of the input DNA molecules can be assembled, with some events exhibiting remarkable complexity. We show that chromothriptic rearrangements occurred before copy number amplifications and that single-nucleotide and structural variants are not correlated. We predict significant advances in structural variant science using 10×/GROC-SVs and other read cloud-specific methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.