2021
DOI: 10.1158/0008-5472.can-20-2151
|View full text |Cite
|
Sign up to set email alerts
|

Misannotated Multi-Nucleotide Variants in Public Cancer Genomics Datasets Lead to Inaccurate Mutation Calls with Significant Implications

Abstract: Although next-generation sequencing is widely used in cancer to profile tumors and detect variants, most somatic variant callers used in these pipelines identify variants at the lowest possible granularity, single-nucleotide variants (SNV). As a result, multiple adjacent SNVs are called individually instead of as a multi-nucleotide variants (MNV). With this approach, the amino acid change from the individual SNV within a codon could be different from the amino acid change based on the MNV that results from com… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 46 publications
0
6
0
Order By: Relevance
“…Recently, it has been recognized that in NGS analyses, MNVs can be miscalled resulting in misannotations and incorrect aminoacid prediction by GATK variant calling. Considering the negative impact on clinical care, novel NGS variant callers that incorporate haplotype information and performs phasing of SNVs have been recommended ( Wang et al, 2020b ; Srinivasan et al, 2021 ).…”
Section: Discussionmentioning
confidence: 99%
“…Recently, it has been recognized that in NGS analyses, MNVs can be miscalled resulting in misannotations and incorrect aminoacid prediction by GATK variant calling. Considering the negative impact on clinical care, novel NGS variant callers that incorporate haplotype information and performs phasing of SNVs have been recommended ( Wang et al, 2020b ; Srinivasan et al, 2021 ).…”
Section: Discussionmentioning
confidence: 99%
“…Both FHIR Genomics Operations and GA4GH leverage canonical SPDI format, making annotation of these variant types relatively straightforward. [27][28][29] Our approach has been to leverage variant phase data to combine SNVs into MNVs where possible in order to compute additional potentially relevant annotations.…”
Section: Discussionmentioning
confidence: 99%
“…Dynamic annotation of MNVs is complicated by the fact that (1) some variant callers will report MNVs whereas others only report component SNVs; (2) knowledge bases may contain MNVs and/or SNVs depending on what was submitted; and (3) bioinformatics tools may predict different molecular consequences for an MNV versus component SNVs. In fact, literature suggests that misannotation of MNVs is common and carries significant clinical implications 27‐29 . Our approach has been to leverage variant phase data to combine SNVs into MNVs where possible in order to compute additional potentially relevant annotations.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The integration and interpretation of proteogenomic measurements, and ultimately, the test of their value, comes from the development of novel computational approaches. Data integration is first challenged by the fact that data exist on varying scales—genetic mutations are often assigned as discrete types of calls depending on the type of change in the DNA sequence ( 129 , 130 ), while transcript measurements represent absolute changes in the mRNA relative to the length of the transcript and the depth of sequencing ( 131 ), and protein measurements are log ratio values representing the amount of sample measured relative to a standard control ( 132 ). Methods to overcome these challenges depend on the type of analysis at hand: Namely, nonnegative matrix factorization helps identify clusters of samples that behave similarly across scales ( 133 135 ) using all types of omic data, differential expression analyses can be performed between experimental conditions, and overlap between those features compared.…”
Section: Computational Data Integration and Bioinformatic Challengesmentioning
confidence: 99%