2021
DOI: 10.1101/2021.05.07.442430
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Failure to detect mutations inU2AF1due to changes in the GRCh38 reference sequence

Abstract: The U2AF1 gene is a core part of mRNA splicing machinery and frequently contains somatic mutations that contribute to oncogenesis in MDS, AML, and other cancers. A change introduced in the GRCh38 version of the human reference build prevents mutations in this gene from being detected by many variant calling pipelines. We describe the problem in detail and show that a modified GRCh38 reference build with unchanged coordinates can be used to ameliorate the issue. This reference is available at https://zenodo.org… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 16 publications
0
4
0
Order By: Relevance
“…While previous studies concluded that variant calling performance is generally better on GRCh38 34 , 35 , our benchmark demonstrates that variant calls in some genes are less accurate on GRCh38 than GRCh37. Another group recently independently identified the importance of masking the extra copy of one gene ( U2AF1/U2AF1L5 ) for cancer research 36 . Our results identify that false duplications cause many of the discrepancies found recently between exome variant calls on GRCh37 and GRCh38, 37 .We produced similar benchmarks for both versions of the reference, so that scientists can better understand strengths and weaknesses of each reference, and test modifications to the reference such as the hs37d5 decoy for GRCh37 or the masked GRCh38 we propose here.…”
Section: Discussionmentioning
confidence: 99%
“…While previous studies concluded that variant calling performance is generally better on GRCh38 34 , 35 , our benchmark demonstrates that variant calls in some genes are less accurate on GRCh38 than GRCh37. Another group recently independently identified the importance of masking the extra copy of one gene ( U2AF1/U2AF1L5 ) for cancer research 36 . Our results identify that false duplications cause many of the discrepancies found recently between exome variant calls on GRCh37 and GRCh38, 37 .We produced similar benchmarks for both versions of the reference, so that scientists can better understand strengths and weaknesses of each reference, and test modifications to the reference such as the hs37d5 decoy for GRCh37 or the masked GRCh38 we propose here.…”
Section: Discussionmentioning
confidence: 99%
“…However, we demonstrate that masking false duplications on GRCh38 greatly improves performance in these genes. Interestingly, another group recently independently identified the importance of masking the extra copy of one gene (U2AF1/U2AF1L5) for cancer research 36 . Our results identify that false duplications cause many of the discrepancies found recently between exome variant calls on GRCh37 and GRCh38, 37 and highlight the importance of our proposed masked GRCh38 genome.…”
Section: Discussionmentioning
confidence: 99%
“…A special approach was required to identify somatic variants in U2AF1 since an erroneous segmental duplication in the region of the gene in the hg38 reference genome resulted in a mapping score of zero during alignment of the FASTQ file 50 . We developed a Rust-HTSLIB binary (https://github.com/weinstockj/pileup_region) to specifically identify reads associated with the U2AF1 variants S34F, S34Y, R156H, Q157P, and Q157R.…”
Section: Methodsmentioning
confidence: 99%