2019
DOI: 10.1101/868570
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Thousands of missing variants in the UK BioBank are recoverable by genome realignment

Abstract: The UK Biobank is an unprecedented resource for human disease research. In March 2019, 49,997 exomes were made publicly available to investigators. Here we note that thousands of variant calls are unexpectedly absent from the current dataset, with 641 genes showing zero variation. We show that the reason for this was an erroneous read alignment to the GRCh38 reference. The missing variants can be recovered by modifying read alignment parameters to correctly handle the expanded set of contigs available in the h… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
17
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 14 publications
(18 citation statements)
references
References 16 publications
(11 reference statements)
1
17
0
Order By: Relevance
“…We next annotated all variants with Variant Effect Predictor (v97) 57 , extracted variants in exclusively X-linked recessive genes, and retained only the most severe consequence in canonical transcripts. We note that none of these X-linked recessive genes are affected by the recently-reported problem with this UKBB exome data release which is related to mapping errors 63 .…”
Section: Uk Biobank Analysismentioning
confidence: 74%
“…We next annotated all variants with Variant Effect Predictor (v97) 57 , extracted variants in exclusively X-linked recessive genes, and retained only the most severe consequence in canonical transcripts. We note that none of these X-linked recessive genes are affected by the recently-reported problem with this UKBB exome data release which is related to mapping errors 63 .…”
Section: Uk Biobank Analysismentioning
confidence: 74%
“…It has recently been reported that the UK Biobank exome sequencing data is missing variant calls in regions where all reads were assigned MAPQ=0 (for more details, see Jia et al 62 ).…”
Section: Processing Snv/indel Data From Wesmentioning
confidence: 99%
“…In an analysis of this issue 74 , Jia et al compared the number of exome variants per gene identified when using whole-exome sequencing data from the UK Biobank versus using data from gnomAD. They found 641 genes for which the UK Biobank exome data contains no variants whatsoever.…”
Section: Results and Analysismentioning
confidence: 99%