2018
DOI: 10.1038/s41467-018-06159-4
|View full text |Cite
|
Sign up to set email alerts
|

Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects

Abstract: Hundreds of thousands of human whole genome sequencing (WGS) datasets will be generated over the next few years. These data are more valuable in aggregate: joint analysis of genomes from many sources increases sample size and statistical power. A central challenge for joint analysis is that different WGS data processing pipelines cause substantial differences in variant calling in combined datasets, necessitating computationally expensive reprocessing. This approach is no longer tenable given the scale of curr… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
101
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
4

Relationship

1
9

Authors

Journals

citations
Cited by 175 publications
(101 citation statements)
references
References 21 publications
0
101
0
Order By: Relevance
“…Sequence data processing was performed periodically to produce genotype data "Freezes" that include all samples available at a given time. All sequence was remapped using BWA-MEM 83 to the hs38DH 1000 Genomes build 38 human genome reference including decoy sequences, following the protocol 84 published by Regier et al 2018 85 . Variant discovery and genotype calling was performed jointly, across TOPMed Parent studies, for all samples in a given Freeze using the GotCloud 86 pipeline.…”
Section: Sequence Data Processing and Variant Callingmentioning
confidence: 99%
“…Sequence data processing was performed periodically to produce genotype data "Freezes" that include all samples available at a given time. All sequence was remapped using BWA-MEM 83 to the hs38DH 1000 Genomes build 38 human genome reference including decoy sequences, following the protocol 84 published by Regier et al 2018 85 . Variant discovery and genotype calling was performed jointly, across TOPMed Parent studies, for all samples in a given Freeze using the GotCloud 86 pipeline.…”
Section: Sequence Data Processing and Variant Callingmentioning
confidence: 99%
“…Using the pipeline from the Centers for the Common Disease Genomics project (Regier et al, 2018), FASTQ reads were aligned to the GRCh38 reference from the 1000 Genomes Project using BWA-MEM version 0.7.15. Reads were sorted and duplicates were removed with Picard, version 2.17.5.; base quality score recalibration was then performed with the Genome Analysis Toolkit (GATK), v3.8-0-ge9d806836.…”
Section: Wgs Variant Callingmentioning
confidence: 99%
“…The UK Biobank (UKB) is a large cohort study consists of approximately half a million participants aged between 40 and 69 at recruitment, with extensive phenotypic records 18 65 and Functionally Equivalent (FE) 66 .…”
Section: Uk Biobank Datamentioning
confidence: 99%