2020
DOI: 10.1101/2020.12.11.422022
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

An Extensive Sequence Dataset of Gold-Standard Samples for Benchmarking and Development

Abstract: Accurate standards and extensive development datasets are the foundation of technical progress. To facilitate benchmarking and development, we sequence 9 samples, covering the Genome in a Bottle truth sets on multiple instruments (NovaSeq, HiSeqX, HiSeq4000, PacBio Sequel II System) and sample preparations (PCR-Free, PCR-Positive) for both whole genome and multiple exome kits. We benchmark pipelines, quantifying strengths and limitations for sequencing and analysis methods. We identify variability within and b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

3
5

Authors

Journals

citations
Cited by 21 publications
(19 citation statements)
references
References 39 publications
(45 reference statements)
0
19
0
Order By: Relevance
“…Exome models are trained on Agilent SureSelect v7, IDT-xGen, and Truseq capture kits. This data has been previously described and released 33 . DeepTrio is trained on examples from chromosomes 1-19.…”
Section: Modifying Deepvariant To Call Triosmentioning
confidence: 99%
See 1 more Smart Citation
“…Exome models are trained on Agilent SureSelect v7, IDT-xGen, and Truseq capture kits. This data has been previously described and released 33 . DeepTrio is trained on examples from chromosomes 1-19.…”
Section: Modifying Deepvariant To Call Triosmentioning
confidence: 99%
“…The generation of sequencing data for training is described in detail in Baid et al 2020 33 and the WGS and PacBio evaluation data in Olson et al 2020 22 . In summary, all WGS and exome runs were conducted with 151-bp paired-end reads at 50x intended coverage from NovaSeq and HiSeqX platforms.…”
Section: Generation Of Sequencing Datamentioning
confidence: 99%
“…We further evaluated the performance of the models using two whole-exome sequencing (WES) datasets from a recently released set of genome and exome data for HG003 [28] (Figure 2c, Tables S2 and S3). Both datasets were aligned to GRCh37 and evaluated using the GIAB v4.2.1 truth set.…”
Section: Population Information Improves Variant Calling Performancementioning
confidence: 99%
“…We trained the model following the procedure described in [2], with additional Illumina WGS datasets included [28]. Variants in chromosomes 1 to 19 are used as the training examples, and those in chromosome 21 and 22 are used for tuning.…”
Section: Model Trainingmentioning
confidence: 99%
“…This would improve the models and increase robustness of the best-performing variant callers on different ethnical backgrounds, and increase the opportunity for cross-benchmarking. Recently, a new set of highquality reference datasets have been generated for benchmarking of variant callers (Baid et al, 2020), and evaluation of several variant callers' performance on these data corroborates the results was not certified by peer review) is the author/funder. All rights reserved.…”
Section: Discussionmentioning
confidence: 80%