While the cost of whole genome sequencing (WGS) is approaching the realm of routine medical tests, it remains too tardy to help guide the management of many acute medical conditions. Rapid WGS is imperative in light of growing evidence of its utility in acute care, such as in diagnosis of genetic diseases in very ill infants, and genotype-guided choice of chemotherapy at cancer relapse. In such situations, delayed, empiric, or phenotype-based clinical decisions may meet with substantial morbidity or mortality. We previously described a rapid WGS method, STATseq, with a sensitivity of >96 % for nucleotide variants that allowed a provisional diagnosis of a genetic disease in 50 h. Here improvements in sequencing run time, read alignment, and variant calling are described that enable 26-h time to provisional molecular diagnosis with >99.5 % sensitivity and specificity of genotypes. STATseq appears to be an appropriate strategy for acutely ill patients with potentially actionable genetic diseases.Electronic supplementary materialThe online version of this article (doi:10.1186/s13073-015-0221-8) contains supplementary material, which is available to authorized users.
SummaryThe precisionFDA Truth Challenge V2 aimed to assess the state-of-the-art of variant calling in difficult-to-map regions and the Major Histocompatibility Complex (MHC). Starting with FASTQ files, 20 challenge participants applied their variant calling pipelines and submitted 64 variant callsets for one or more sequencing technologies (~35X Illumina, ~35X PacBio HiFi, and ~50X Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with the new GIAB benchmark sets and genome stratifications. Challenge submissions included a number of innovative methods for all three technologies, with graph-based and machine-learning methods scoring best for short-read and long-read datasets, respectively. New methods out-performed the 2016 Truth Challenge winners, and new machine-learning approaches combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants.
Introduction: Next-generation whole-genome sequencing promises to enable dramatic expansion of precision oncology and personalized cancer care. To support this effort, it is critical to develop computational tools that can analyze sequence data accurately and with rapid turn-around-time. One particularly challenging problem is the calling of somatic variants in matched tumor and normal samples. We present the DRAGENTM 3.5 somatic pipeline that performs 110x/40x whole-genome end-to-end analysis in under two hours with accuracy superior to that of all other tools we have benchmarked, including Mutect2/GATK4 and Strelka2. In addition, DRAGEN 3.5 is robust against variations in coverage, sequencing platform, sample preparation chemistry, and tumor purity. It also tolerates tumor-in-normal contamination, thereby making the pipeline applicable to late-stage solid tumors or hematological cancers. Methods: DRAGEN 3.5 replaces the legacy genotyping model (originally developed as part of MuTect2) with that of Strelka2. Whereas the MuTect2 model performs separate analyses on the tumor and normal samples, the Strelka2 model performs a joint analysis, allowing (1) detection of systematic errors that affect both samples simultaneously and (2) modeling of tumor-in-normal contamination, where the amount of contamination in the normal sample depends on the allele frequency observed in the tumor sample at the locus in question. In addition, we improved the probabilistic model of systematic error by incorporating models of strand bias and mismapping. Results: We benchmarked DRAGEN against Mutect2 (included in GATK 4.1.2) and Strelka2 (version 2.9.9) on five public synthetic and real datasets with known truth sets. DRAGEN greatly outperformed the other methods on all five datasets, producing 14-67% and 22-91% fewer false SNV calls, and 35-86% and 48-89% fewer false indel calls than Strelka2 and Mutect2 respectively. DRAGEN also exhibits higher tolerance to tumor-in-normal (TiN) contamination than Strelka2 which is already equipped with a model tolerating TiN contamination. The average end-to-end workflow runtime of the DRAGEN somatic pipeline was 77 minutes, 75% and 830% faster than Strelka2 and Mutect2 taking DRAGEN alignments as input. Conclusion: The DRAGEN pipeline enables reliable whole-genome analysis that can be scaled to large numbers of samples, leading to better tumor characterization and improved interpretation. We anticipate that it will ultimately fuel progress in oncology, cancer research and precision medicine. The DRAGEN 3.5 somatic pipeline can be run either locally on a DRAGEN server or remotely in the cloud via https://basespace.illumina.com. Citation Format: Konrad Scheffler, Sangtae Kim, Varun Jain, Jeffrey Yuan, Westley Sherman, Taylor O'Connell, Eric Ojard, Lisa Murray, Rami Mehio, Severine Catreux. Accuracy improvements in somatic whole-genome small-variant calling with the DRAGEN platform [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 5463.
We present the DRAGEN™ somatic pipeline for calling small somatic variants from tumor samples, with or without paired normal samples. The DRAGEN somatic variant caller offers 1) a flexible architecture that can be used on a wide array of somatic use cases; 2) built-in noise models enabling robustness against various sources of noise artifacts (mapping, genome context, or sample specific); 3) performance of joint analysis of tumor and normal samples in the case of a tumor-normal workflow yielding improved accuracy; 4) benefits from FPGA acceleration for efficient run time. We demonstrate the speed and accuracy of the DRAGEN tumor-normal pipeline across a range of whole genome sequencing (WGS) datasets and compare against third party tools such as Mutect2/GATK4 [1] and Strelka2 [2]. DRAGEN secondary analysis outperforms all other tools with its ability to complete a 110x/40x T/N whole-genome analysis in less than two hours. It offers exceptional accuracy, with higher sensitivity and precision than third party tools. We also show that the DRAGEN T/N workflow supports analysis of liquid and late-stage solid tumors by tolerating tumor-in-normal (TiN) contamination.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.