2021
DOI: 10.1101/2021.01.06.425550
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Improving variant calling using population data and deep learning

Abstract: Large-scale population variant data is often used to filter and aid interpretation of variant calls in a single sample. These approaches do not incorporate population information directly into the process of variant calling, and are often limited to filtering which trades recall for precision. In this study, we modify DeepVariant to add a new channel encoding population allele frequencies from the 1000 Genomes Project. We show that this model reduces variant calling errors, improving both precision and recall.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(8 citation statements)
references
References 52 publications
1
7
0
Order By: Relevance
“…DeepTrio's ability to improve accuracy of calling, and to do so in a manner which is similar to human intuition regarding de novo variants, demonstrates an ability to capture rules which mirror general knowledge. This is similar to a recent demonstration which re-trained DeepVariant to use population allele frequencies 39 . It is a strong indicator that deep-learning based variant callers can be further improved by finding ways to expose information which captures the underlying biology of samples and populations.…”
Section: Discussionsupporting
confidence: 86%
“…DeepTrio's ability to improve accuracy of calling, and to do so in a manner which is similar to human intuition regarding de novo variants, demonstrates an ability to capture rules which mirror general knowledge. This is similar to a recent demonstration which re-trained DeepVariant to use population allele frequencies 39 . It is a strong indicator that deep-learning based variant callers can be further improved by finding ways to expose information which captures the underlying biology of samples and populations.…”
Section: Discussionsupporting
confidence: 86%
“…Furthermore, although not done here, one should be able to provide the single-sample calling step approximate genotype frequencies, to make this a non-issue in the joint calling step. Providing population data has indeed been shown to improve quality in concordance comparison with GIAB [2].…”
Section: Discussionmentioning
confidence: 99%
“…Deep-learning methods such as DeepVariant have also been augmented with new channels to encode allele frequencies [2] and sequence reads from parents [13] and shown to improve precision and recall in single sample calls.…”
Section: Introductionmentioning
confidence: 99%
“…The critical step of accurately identifying variants is being further improved through newer algorithms, some based on AI methods (Alharbi & Rashid, 2022; Olson et al, 2023; Poplin et al, 2018). For instance, a deep learning method (DeepVariant‐AF) developed recently by Google Health considers population allele frequencies from the 1000 Genomes Project and appears to call variants more accurately than prior methods (Chen, Kolesnikov, et al, 2023).…”
Section: Sequencing and Bioinformaticsmentioning
confidence: 99%