2019
DOI: 10.1101/577338
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Improving imputation quality in BEAGLE for crop and livestock data

Abstract: Imputation is one of the key steps in the preprocessing and quality control protocol of any genetic study. Most imputation algorithms were originally developed for the use in human genetics and thus are optimized for a high level of genetic diversity. Different versions of BEAGLE were evaluated on genetic datasets of doubled haploids of two European maize landraces, a commercial breeding line and a diversity panel in chicken, respectively, with different levels of genetic diversity and structure which can be t… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
40
1

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 20 publications
(43 citation statements)
references
References 41 publications
2
40
1
Order By: Relevance
“…Genotype calls were coded according to the allele counts of the B73 AGPv4 reference sequence (Jiao et al, 2017) (0 = homozygous for the reference allele, 2 = homozygous for the alternative allele). Imputation of missing values was performed separately for each landrace, using BEAGLE version 4.0 with parameters buildwindow=50, nsamples=50 (Browning and Browning, 2007;Pook et al, 2020). As the dataset only included doubled haploid lines and heterozygous calls were not expected, the DS (dosage) information of the BEAGLE output was used to recode remaining heterozygous calls.…”
Section: Ev _v6mentioning
confidence: 99%
“…Genotype calls were coded according to the allele counts of the B73 AGPv4 reference sequence (Jiao et al, 2017) (0 = homozygous for the reference allele, 2 = homozygous for the alternative allele). Imputation of missing values was performed separately for each landrace, using BEAGLE version 4.0 with parameters buildwindow=50, nsamples=50 (Browning and Browning, 2007;Pook et al, 2020). As the dataset only included doubled haploid lines and heterozygous calls were not expected, the DS (dosage) information of the BEAGLE output was used to recode remaining heterozygous calls.…”
Section: Ev _v6mentioning
confidence: 99%
“…Overall, the opportunities for identifying long, shared segments will be higher in SNP datasets from populations subjected to a recent history of intensive selection, as is commonly present in livestock and crop datasets. Recent work has suggested that the phasing accuracy for these kinds of datasets is extremely high (Pook et al 2019) and should, therefore, be sufficient for the application of HaploBlocker. For datasets containing less related individuals, as commonly present in human data, poor phasing accuracy can limit the applicability and usefulness of HaploBlocker.…”
Section: Resultsmentioning
confidence: 99%
“…However, in the ROH studies we chose to adopt an extreme lower parameter due to the presence of studies which recommended not using the MAF threshold (possible underestimation) [51]. For the iHS and TD analyses, the database was computed on Beagle version 5.0, which provides faster and more accurate algorithms for genotype haplotyping/phasing [52].…”
Section: Horse Samples Were Collected From Brazil During the 36th Bramentioning
confidence: 99%