2018
DOI: 10.1101/270157
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Best Practices for Benchmarking Germline Small Variant Calls in Human Genomes

Abstract: Assessing accuracy of NGS variant calling is immensely facilitated by a robust benchmarking strategy and tools to carry it out in a standard way. Benchmarking variant calls requires careful attention to definitions of performance metrics, sophisticated comparison approaches, and stratification by variant type and genome context. The Global Alliance for Genomics and Health (GA4GH) Benchmarking Team has developed standardized performance metrics and tools for benchmarking germline small variant calls. This Team … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
189
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 122 publications
(190 citation statements)
references
References 25 publications
1
189
0
Order By: Relevance
“…Moreover, k-mers naturally capture heterozygous insertion and deletion variants and are thus immune to the problems of calling these types of variants with a reference mapping approach. For example, consortiums such as the GA4GH exclude all variant calls within complex, repetitive regions of the genome 25 . In contrast, hap-mers inherently capture genetic context, regardless of the structural complexity surrounding them in the genome.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…Moreover, k-mers naturally capture heterozygous insertion and deletion variants and are thus immune to the problems of calling these types of variants with a reference mapping approach. For example, consortiums such as the GA4GH exclude all variant calls within complex, repetitive regions of the genome 25 . In contrast, hap-mers inherently capture genetic context, regardless of the structural complexity surrounding them in the genome.…”
Section: Resultsmentioning
confidence: 99%
“…Alternative methods report phasing statistics from small variants (mostly SNPs) called with short-read mapping 8,[19][20][21][22] , or use benchmark genomes that have curated, phased variation call sets [23][24][25][26] . Both methods rely on a reference sequence as the primary source to detect heterozygous variations.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…We evaluated the utility and accuracy of our benchmark MHC small variant set by comparing a DeepVariant v0.8 callset 18 from ~30x PacBio Sequel II 11 kb CCS reads to the benchmark, followed by manually curating putative false positives (FPs) and false negatives (FNs). When using hap.py with the vcfeval option to account for differences in representation of the many complex variants in the MHC 19 , there were 20074 TPs (matching 19320 calls in the benchmark VCF), 366 FPs, and 2176 FNs (of which 290 FPs and 260 FNs were genotyping errors or partial allele matches). To show our benchmark reliably identifies FPs and FNs, we manually curated 10 random genotyping errors or partial allele matches, as well as 10 random FPs and 10 random FNs that were not genotyping errors or partial allele matches.…”
Section: Create a Reliable Small Variant Benchmark Set From The Haplomentioning
confidence: 99%
“…We subdivide each list into separate registers of insertion, deletion start, and deletion end breakpoints. Corresponding truth-query pairs are input to an app developed by the Global Alliance for Genomics and Health (GA4GH Benchmarking) made available on precisionFDA 35 . In a distance-based comparison, it categorizes calls as true positives, false positives, or false negatives, and uses this information to derive precision and recall scores.…”
Section: Protocol For Indel Benchmarkingmentioning
confidence: 99%