2016
DOI: 10.1186/s12859-016-1211-6
|View full text |Cite
|
Sign up to set email alerts
|

A hybrid computational strategy to address WGS variant analysis in >5000 samples

Abstract: BackgroundThe decreasing costs of sequencing are driving the need for cost effective and real time variant calling of whole genome sequencing data. The scale of these projects are far beyond the capacity of typical computing resources available with most research labs. Other infrastructures like the cloud AWS environment and supercomputers also have limitations due to which large scale joint variant calling becomes infeasible, and infrastructure specific variant calling strategies either fail to scale up to la… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
3
3

Relationship

3
3

Authors

Journals

citations
Cited by 9 publications
(7 citation statements)
references
References 32 publications
(49 reference statements)
0
7
0
Order By: Relevance
“…Here we describe WGS from the first 53,831 TOPMed samples selected from data sets that are now available via dbGaP controlled-access (Supplementary Tables 1 and 2); additional data will be made available as quality control (QC), variant calling and dbGaP curation are completed. Our work identifies and characterizes the rare variants that comprise the majority of human genomic variation 7, [12][13][14] and extends previous efforts that relied on genotyping arrays [15][16][17] , low-coverage WGS 7,8 , exome sequencing 2,12,18 , or analyses of smaller sample collections [19][20][21][22] . Since rare variants represent more recent and potentially more deleterious changes that can impact protein function, gene expression, or other biologically important elements, their discovery and study are crucial for understanding the genetics and biology of human health and disease 13,23,24 .…”
Section: Summary Paragraphmentioning
confidence: 74%
See 1 more Smart Citation
“…Here we describe WGS from the first 53,831 TOPMed samples selected from data sets that are now available via dbGaP controlled-access (Supplementary Tables 1 and 2); additional data will be made available as quality control (QC), variant calling and dbGaP curation are completed. Our work identifies and characterizes the rare variants that comprise the majority of human genomic variation 7, [12][13][14] and extends previous efforts that relied on genotyping arrays [15][16][17] , low-coverage WGS 7,8 , exome sequencing 2,12,18 , or analyses of smaller sample collections [19][20][21][22] . Since rare variants represent more recent and potentially more deleterious changes that can impact protein function, gene expression, or other biologically important elements, their discovery and study are crucial for understanding the genetics and biology of human health and disease 13,23,24 .…”
Section: Summary Paragraphmentioning
confidence: 74%
“…We also evaluated potential benefits from high coverage WGS relative to exome sequencing (depth >30X) 18 and low coverage WGS (depth >6X) 22 in 430 Framingham Heart Study samples. TOPMed WGS identified 23.8 million variants in these samples, compared with 20.5 million variants in low coverage sequencing analysis (a 16% increase).…”
Section: Topmed Wgsmentioning
confidence: 99%
“…The alignment was done using BWA [16] integrated in the Mercury pipeline [17]. We called biallelic SNPs across all 5297 samples, taking an ensemble variant calling approach goSNAP [18], which employs four variant callers GATK-HaplotypeCaller [19, 20] with gVCF option, GATK-UnifiedGenotyper [19, 20], SNPTools [21] and GotCloud [22], each enforced in a joint calling mode. To ensure a high quality variant call set, we applied a consensus filtering and selected 72,945,834 variant sites which were called at least in 3 out of all 4 callers.…”
Section: Methodsmentioning
confidence: 99%
“…Repositories of SNP data for TB community has been generated and is on the rise [24][25][26][27][28][29], but, all predictions till date have been done using individual samples, which have their own limitations and are known to include false-positives, due to low coverage, small read lengths and sequencing errors [30]. An effort to enlist the variation pro le present within a cohort is still lacking as it becomes computationally challenging to predict SNP/Indel present across samples within a cohort during multi-sample variant prediction [31].…”
Section: Introductionmentioning
confidence: 99%
“…Approaches like Joint Variant Calling (JVC), (concept used for the rst time on prokaryotes in this study) which predict variants present in a cohort, promises to overcome the shortcomings proposed by single sample variant calling (SVC) methods, as variants are analyzed simultaneously across all samples in a population [31,37,38]. JVC can predict variants for low coverage data in cohorts with high sensitivity.…”
Section: Introductionmentioning
confidence: 99%