2020
DOI: 10.1111/ahg.12383
|View full text |Cite
|
Sign up to set email alerts
|

Thousands of missing variants in the UK Biobank are recoverable by genome realignment

Abstract: The UK Biobank is an unprecedented resource for human disease research. In March 2019, 49,997 exomes were made publicly available to investigators. Here we note that thousands of variant calls are unexpectedly absent from this dataset, with 641 genes showing zero variation. We show that the reason for this was an erroneous read alignment to the GRCh38 reference. The missing variants can be recovered by modifying read alignment parameters to correctly handle the expanded set of contigs available in the human ge… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
11
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 22 publications
(12 citation statements)
references
References 18 publications
1
11
0
Order By: Relevance
“…We next annotated all variants with Variant Effect Predictor (v97) 57 , extracted variants in exclusively X-linked recessive genes, and retained only the most severe consequence in canonical transcripts. We note that none of these X-linked recessive genes are affected by the recently-reported problem with this UKBB exome data release which is related to mapping errors 63 .…”
Section: Methodsmentioning
confidence: 74%
“…We next annotated all variants with Variant Effect Predictor (v97) 57 , extracted variants in exclusively X-linked recessive genes, and retained only the most severe consequence in canonical transcripts. We note that none of these X-linked recessive genes are affected by the recently-reported problem with this UKBB exome data release which is related to mapping errors 63 .…”
Section: Methodsmentioning
confidence: 74%
“…With the increased adoption of the reference genome version GRCh38, many pipelines currently utilizing older reference genomes will need mechanisms to test their correct functionality in GRCh38 before the transition. Adopting a new reference genome can sometimes have unanticipated side‐effects, as was highlighted with an analysis of missing variant calls from realigning WGS datasets in the UK Biobank (Jia et al, 2020). A diverse set of variant scenarios involving several inheritance models, genic impacts, and mix of novel and known pathogenic variants has utility for testing these continuously updated pipelines, without the challenge of handling sensitive patient data.…”
Section: Discussionmentioning
confidence: 99%
“…Second, it has been reported that there was an issue with the UKB functionally equivalent WES calling. 53 This mapping issue may have resulted in under-calling alternative alleles and therefore should not increase false positive findings. Third, we relied on a meta-analysis approach using summary statistics to perform our gene-based testing due to differences in sequencing platforms and genotyping calling within the multiple consortia contributing to the results.…”
Section: Discussionmentioning
confidence: 99%