Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Recent advances in sequencing technology make it possible to comprehensively catalogue genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (1) initial read mapping; (2) local realignment around indels; (3) base quality score recalibration; (4) SNP discovery and genotyping to find all potential variants; and (5) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We discuss the application of these tools, instantiated in the Genome Analysis Toolkit (GATK), to deep whole-genome, whole-exome capture, and multi-sample low-pass (~4×) 1000 Genomes Project datasets.
The recent discovery of mutations in metabolic enzymes has rekindled interest in harnessing the altered metabolism of cancer cells for cancer therapy. One potential drug target is isocitrate dehydrogenase 1 (IDH1), which is mutated in multiple human cancers. Here, we examine the role of mutant IDH1 in fully transformed cells with endogenous IDH1 mutations. A selective R132H-IDH1 inhibitor (AGI-5198) identified through a high-throughput screen blocked, in a dose-dependent manner, the ability of the mutant enzyme (mIDH1) to produce R-2-hydroxyglutarate (R-2HG). Under conditions of near-complete R-2HG inhibition, the mIDH1 inhibitor induced demethylation of histone H3K9me3 and expression of genes associated with gliogenic differentiation. Blockade of mIDH1 impaired the growth of IDH1-mutant—but not IDH1–wild-type—glioma cells without appreciable changes in genome-wide DNA methylation. These data suggest that mIDH1 may promote glioma growth through mechanisms beyond its well-characterized epigenetic effects.
A number of human cancers harbor somatic point mutations in the genes encoding isocitrate dehydrogenases 1 and 2 (IDH1 and IDH2). These mutations alter residues in the enzyme active sites and confer a gain-of-function in cancer cells, resulting in the accumulation and secretion of the oncometabolite (R)-2-hydroxyglutarate (2HG). We developed a small molecule, AGI-6780, that potently and selectively inhibits the tumor-associated mutant IDH2/R140Q. A crystal structure of AGI-6780 complexed with IDH2/R140Q revealed that the inhibitor binds in an allosteric manner at the dimer interface. The results of steady-state enzymology analysis were consistent with allostery and slow-tight binding by AGI-6780. Treatment with AGI-6780 induced differentiation of TF-1 erythroleukemia and primary human acute myelogenous leukemia cells in vitro. These data provide proof-of-concept that inhibitors targeting mutant IDH2/R140Q could have potential applications as a differentiation therapy for cancer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.