Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing

O'Rawe, Jason; Jiang, Tao; Sun, Guangqing; Wu, Yiyang; Wang, Weimin; Hu, Jing-Chu; Bodily, Paul; Tian, Lifeng; Hákonarson, Hákon; Johnson, W. Evan; Wei, Zhi; Wang, Kai; Lyon, Gholson J.

doi:10.1186/gm432

Cited by 401 publications

(394 citation statements)

References 56 publications

(66 reference statements)

Supporting

Mentioning

359

Contrasting

Unclassified

Order By: Relevance

“…Recent publications have demonstrated hundreds of thousands of differences between variant calls from different whole human genome sequencing methods or different bioinformatics methods [5][6][7][8][9][10][11] . To understand these differences, we describe a high-confidence set of genome-wide genotype calls that can be used as a benchmark.…”

Section: A N a Ly S I Smentioning

confidence: 99%

Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls

Zook

Chapman

Wang³

et al. 2014

Nat Biotechnol

743

788

View full text Add to dashboard Cite

Section: A N a Ly S I Smentioning

confidence: 99%

Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls

Zook

Chapman

Wang³

et al. 2014

Nat Biotechnol

743

788

View full text Add to dashboard Cite

“…The process of identifying mutations in NGS data can broadly be divided into three stages; generation of primary data performed by the sequencer, secondary data which includes derived DNA sequence and alignment of reads, and tertiary interpretation data, including the identification of variants and annotation. Two milestones in data analysis are the primary data (from which all results can be regenerated) and the tertiary interpreted variant files, which can be considered an end product of data analysis and are highly dependent on the steps used during data analysis (for instance, there is a low concordance between several commonly used bioinformatics pipelines for variant calling 33 ). Even the commonly used BAM file 34 does not represent primary sequence data, but is the result of aligning sequence reads to a specific reference genome.…”

Section: Do We Have An Obligation To Report and Analyse Ifs?mentioning

confidence: 99%

Towards a European consensus for reporting incidental findings during clinical NGS testing

et al. 2015

View full text Add to dashboard Cite

“…Our results also indicate that using independent library preparation replicates is an effective way to identify false positive calls [5]. Recently, O'Rawe et al showed that NGS analysis of the same data set using different variant caller pipelines often resulted in low concordance [7]. Even though restricting ones focus to only the shared variant calls from multiple data analysis pipelines may be an effective way to eliminate some false positives, this approach will not be able to eliminate certain artifacts as effectively as the triplicate approach.…”

Section: Short Communicationmentioning

confidence: 67%

Next Generation Sequencing and Its Clinical Applications: The Growing Pains

Chang¹,

Marton²

2016

Next Generat Sequenc & Applic

View full text Add to dashboard Cite

Short CommunicationAs the list of applications of next generation sequencing (NGS)-based assays continues to grow and the user network continues to expand from academic and pharmaceutical discovery research to clinical decision-making tests, the challenges and controversies continue to persist. Over the past year, the FDA held several workshops on the analytical and clinical validation of NGS tests that resulted in the release of two guidance documents, for which it has requested feedback from NGS stakeholders [1,2]. One key area of debate is defining the best method for establishing analytical validity of NGS tests (use of standards or the use of processes such as quality system regulations, QSRs). Some feel that QSRs are overly burdensome, impractical and cost-prohibitive; others feel the use of standards may be insufficiently rigorous. Our concern is how the standards would be implemented. Another question is related to whether or how clinical labs should confirm novel variant calls. For example, if a whole exome sequencing (WES) of a sample results in 500 somatic mutation calls, it would be essentially impossible (both cost prohibitive and time-consuming) to confirm all 500 variants using an orthogonal method. In some clinical or hospital settings, verifying specific decision-making variant calls for a given patient with special disease condition using orthogonal methods might be feasible or justifiable. However, if the assay is to be used for the determination of patients' hyper-mutation status within a clinical trial, it might be unrealistic to confirm every potential variant call.A third controversy is whether clinical databases could be used for clinical validation of novel mutations not in the literature, especially those mutations for which analytical confirmation was not explicitly performed. Unfortunately, if a public database of genotype-phenotype associations were created using historic data with limited variant confirmation or reproducibility measures in place, even if the data were from reputable labs, using such a database to support clinical validity of an NGS-based in vitro diagnostic might have unintended consequences, and could even increase the risk of getting incorrect diagnoses in the clinic [3,4]. Of course, if the database only contained those specific variants directly supported by the clinical evidence of genotype-phenotype association on a variant-by-variant basis, then that might be acceptable and useful. We can illustrate our concern with an extreme example: suppose a patient's sample that has 500 variants derived from its WES data is confirmed to be a responder for a given treatment, are we saying that now all 500 variants are considered validated and should be deposited into this public database? We would argue against this for several reasons, not the least of which would be the lack of confirmation of each mutation.As for the use of standards in NGS-based in vitro diagnostics for germline diseases, although it should have substantial value in terms of serving as control samples...

show abstract

Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing

Cited by 401 publications

References 56 publications

Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls

Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls

Towards a European consensus for reporting incidental findings during clinical NGS testing

Next Generation Sequencing and Its Clinical Applications: The Growing Pains

Contact Info

Product

Resources

About