BackgroundWhole-genome sequence (WGS) data could contain information on genetic variants at or in high linkage disequilibrium with causative mutations that underlie the genetic variation of polygenic traits. Thus far, genomic prediction accuracy has shown limited increase when using such information in dairy cattle studies, in which one or few breeds with limited diversity predominate. The objective of our study was to evaluate the accuracy of genomic prediction in a multi-breed Australian sheep population of relatively less related target individuals, when using information on imputed WGS genotypes.MethodsBetween 9626 and 26,657 animals with phenotypes were available for nine economically important sheep production traits and all had WGS imputed genotypes. About 30% of the data were used to discover predictive single nucleotide polymorphism (SNPs) based on a genome-wide association study (GWAS) and the remaining data were used for training and validation of genomic prediction. Prediction accuracy using selected variants from imputed sequence data was compared to that using a standard array of 50k SNP genotypes, thereby comparing genomic best linear prediction (GBLUP) and Bayesian methods (BayesR/BayesRC). Accuracy of genomic prediction was evaluated in two independent populations that were each lowly related to the training set, one being purebred Merino and the other crossbred Border Leicester x Merino sheep.ResultsA substantial improvement in prediction accuracy was observed when selected sequence variants were fitted alongside 50k genotypes as a separate variance component in GBLUP (2GBLUP) or in Bayesian analysis as a separate category of SNPs (BayesRC). From an average accuracy of 0.27 in both validation sets for the 50k array, the average absolute increase in accuracy across traits with 2GBLUP was 0.083 and 0.073 for purebred and crossbred animals, respectively, whereas with BayesRC it was 0.102 and 0.087. The average gain in accuracy was smaller when selected sequence variants were treated in the same category as 50k SNPs. Very little improvement over 50k prediction was observed when using all WGS variants.ConclusionsAccuracy of genomic prediction in diverse sheep populations increased substantially by using variants selected from whole-genome sequence data based on an independent multi-breed GWAS, when compared to genomic prediction using standard 50K genotypes.
BackgroundThe use of whole-genome sequence (WGS) data for genomic prediction and association studies is highly desirable because the causal mutations should be present in the data. The sequencing of 935 sheep from a range of breeds provides the opportunity to impute sheep genotyped with single nucleotide polymorphism (SNP) arrays to WGS. This study evaluated the accuracy of imputation from SNP genotypes to WGS using this reference population of 935 sequenced sheep.ResultsThe accuracy of imputation from the Ovine Infinium® HD BeadChip SNP (~ 500 k) to WGS was assessed for three target breeds: Merino, Poll Dorset and F1 Border Leicester × Merino. Imputation accuracy was highest for the Poll Dorset breed, although there were more Merino individuals in the sequenced reference population than Poll Dorset individuals. In addition, empirical imputation accuracies were higher (by up to 1.7%) when using larger multi-breed reference populations compared to using a smaller single-breed reference population. The mean accuracy of imputation across target breeds using the Minimac3 or the FImpute software was 0.94. The empirical imputation accuracy varied considerably across the genome; six chromosomes carried regions of one or more Mb with a mean imputation accuracy of < 0.7. Imputation accuracy in five variant annotation classes ranged from 0.87 (missense) up to 0.94 (intronic variants), where lower accuracy corresponded to higher proportions of rare alleles. The imputation quality statistic reported from Minimac3 (R2) had a clear positive relationship with the empirical imputation accuracy. Therefore, by first discarding imputed variants with an R2 below 0.4, the mean empirical accuracy across target breeds increased to 0.97. Although accuracy of genomic prediction was less affected by filtering on R2 in a multi-breed population of sheep with imputed WGS, the genomic heritability clearly tended to be lower when using variants with an R2 ≤ 0.4.ConclusionsThe mean imputation accuracy was high for all target breeds and was increased by combining smaller breed sets into a multi-breed reference. We found that the Minimac3 software imputation quality statistic (R2) was a useful indicator of empirical imputation accuracy, enabling removal of very poorly imputed variants before downstream analyses.Electronic supplementary materialThe online version of this article (10.1186/s12711-018-0443-5) contains supplementary material, which is available to authorized users.
Background This study aimed at (1) comparing the accuracies of genomic prediction for parasite resistance in sheep based on whole-genome sequence (WGS) data to those based on 50k and high-density (HD) single nucleotide polymorphism (SNP) panels; (2) investigating whether the use of variants within quantitative trait loci (QTL) regions that were selected from regional heritability mapping (RHM) in an independent dataset improved the accuracy more than variants selected from genome-wide association studies (GWAS); and (3) comparing the prediction accuracies between variants selected from WGS data to variants selected from the HD SNP panel. Results The accuracy of genomic prediction improved marginally from 0.16 ± 0.02 and 0.18 ± 0.01 when using all the variants from 50k and HD genotypes, respectively, to 0.19 ± 0.01 when using all the variants from WGS data. Fitting a GRM from the selected variants alongside a GRM from the 50k SNP genotypes improved the prediction accuracy substantially compared to fitting the 50k SNP genotypes alone. The gain in prediction accuracy was slightly more pronounced when variants were selected from WGS data compared to when variants were selected from the HD panel. When sequence variants that passed the GWAS threshold of 3 across the entire genome were selected, the prediction accuracy improved by 5% (up to 0.21 ± 0.01), whereas when selection was limited to sequence variants that passed the same GWAS threshold of 3 in regions identified by RHM, the accuracy improved by 9% (up to 0.25 ± 0.01). Conclusions Our results show that through careful selection of sequence variants from the QTL regions, the accuracy of genomic prediction for parasite resistance in sheep can be improved. These findings have important implications for genomic prediction in sheep.
BackgroundThe objectives of this study were to investigate the accuracy of genotype imputation from low (12k) to medium (50k Illumina-Ovine) SNP (single nucleotide polymorphism) densities in purebred and crossbred Merino sheep based on a random or selected reference set and to evaluate the impact of using imputed genotypes on accuracy of genomic prediction.MethodsImputation validation sets were composed of random purebred or crossbred Merinos, while imputation reference sets were of variable sizes and included random purebred or crossbred Merinos or a group of animals that were selected based on high genetic relatedness to animals in the validation set. The Beagle software program was used for imputation and accuracy of imputation was assessed based on the Pearson correlation coefficient between observed and imputed genotypes. Genomic evaluation was performed based on genomic best linear unbiased prediction and its accuracy was evaluated as the Pearson correlation coefficient between genomic estimated breeding values using either observed (12k/50k) or imputed genotypes with varying levels of imputation accuracy and accurate estimated breeding values based on progeny-tests.ResultsImputation accuracy increased as the size of the reference set increased. However, accuracy was higher for purebred Merinos that were imputed from other purebred Merinos (on average 0.90 to 0.95 based on 1000 to 3000 animals) than from crossbred Merinos (0.78 to 0.87 based on 1000 to 3000 animals) or from non-Merino purebreds (on average 0.50). The imputation accuracy for crossbred Merinos based on 1000 to 3000 other crossbred Merino ranged from 0.86 to 0.88. Considerably higher imputation accuracy was observed when a selected reference set with a high genetic relationship to target animals was used vs. a random reference set of the same size (0.96 vs. 0.88, respectively). Accuracy of genomic prediction based on 50k genotypes imputed with high accuracy (0.88 to 0.99) decreased only slightly (0.0 to 0.67 % across traits) compared to using observed 50k genotypes. Accuracy of genomic prediction based on observed 12k genotypes was higher than accuracy based on lowly accurate (0.62 to 0.86) imputed 50k genotypes.
BackgroundThe accuracy of genomic prediction depends largely on the number of animals with phenotypes and genotypes. In some industries, such as sheep and beef cattle, data are often available from a mixture of breeds, multiple strains within a breed or from crossbred animals. The objective of this study was to compare the accuracy of genomic prediction for several economically important traits in sheep when using data from purebreds, crossbreds or a combination of those in a reference population.MethodsThe reference populations were purebred Merinos, crossbreds of Border Leicester (BL), Poll Dorset (PD) or White Suffolk (WS) with Merinos and combinations of purebred and crossbred animals. Genomic breeding values (GBV) were calculated based on genomic best linear unbiased prediction (GBLUP), using a genomic relationship matrix calculated based on 48 599 Ovine SNP (single nucleotide polymorphisms) genotypes. The accuracy of GBV was assessed in a group of purebred industry sires based on the correlation coefficient between GBV and accurate estimated breeding values based on progeny records.ResultsThe accuracy of GBV for Merino sires increased with a larger purebred Merino reference population, but decreased when a large purebred Merino reference population was augmented with records from crossbred animals. The GBV accuracy for BL, PD and WS breeds based on crossbred data was the same or tended to decrease when more purebred Merinos were added to the crossbred reference population. The prediction accuracy for a particular breed was close to zero when the reference population did not contain any haplotypes of the target breed, except for some low accuracies that were obtained when predicting PD from WS and vice versa.ConclusionsThis study demonstrates that crossbred animals can be used for genomic prediction of purebred animals using 50 k SNP marker density and GBLUP, but crossbred data provided lower accuracy than purebred data. Including data from distant breeds in a reference population had a neutral to slightly negative effect on the accuracy of genomic prediction. Accounting for differences in marker allele frequencies between breeds had only a small effect on the accuracy of genomic prediction from crossbred or combined crossbred and purebred reference populations.
BackgroundThe application of genomic selection to sheep breeding could lead to substantial increases in profitability of wool production due to the availability of accurate breeding values from single nucleotide polymorphism (SNP) data. Several key traits determine the value of wool and influence a sheep’s susceptibility to fleece rot and fly strike. Our aim was to predict genomic estimated breeding values (GEBV) and to compare three methods of combining information across traits to map polymorphisms that affect these traits.MethodsGEBV for 5726 Merino and Merino crossbred sheep were calculated using BayesR and genomic best linear unbiased prediction (GBLUP) with real and imputed 510,174 SNPs for 22 traits (at yearling and adult ages) including wool production and quality, and breech conformation traits that are associated with susceptibility to fly strike. Accuracies of these GEBV were assessed using fivefold cross-validation. We also devised and compared three approximate multi-trait analyses to map pleiotropic quantitative trait loci (QTL): a multi-trait genome-wide association study and two multi-trait methods that use the output from BayesR analyses. One BayesR method used local GEBV for each trait, while the other used the posterior probabilities that a SNP had an effect on each trait.ResultsBayesR and GBLUP resulted in similar average GEBV accuracies across traits (~0.22). BayesR accuracies were highest for wool yield and fibre diameter (>0.40) and lowest for skin quality and dag score (<0.10). Generally, accuracy was higher for traits with larger reference populations and higher heritability. In total, the three multi-trait analyses identified 206 putative QTL, of which 20 were common to the three analyses. The two BayesR multi-trait approaches mapped QTL in a more defined manner than the multi-trait GWAS. We identified genes with known effects on hair growth (i.e. FGF5, STAT3, KRT86, and ALX4) near SNPs with pleiotropic effects on wool traits.ConclusionsThe mean accuracy of genomic prediction across wool traits was around 0.22. The three multi-trait analyses identified 206 putative QTL across the ovine genome. Detailed phenotypic information helped to identify likely candidate genes.Electronic supplementary materialThe online version of this article (doi:10.1186/s12711-017-0337-y) contains supplementary material, which is available to authorized users.
Background In this study, we assessed the accuracy of genomic prediction for carcass weight (CWT), marbling score (MS), eye muscle area (EMA) and back fat thickness (BFT) in Hanwoo cattle when using genomic best linear unbiased prediction (GBLUP), weighted GBLUP (wGBLUP), and a BayesR model. For these models, we investigated the potential gain from using pre-selected single nucleotide polymorphisms (SNPs) from a genome-wide association study (GWAS) on imputed sequence data and from gene expression information. We used data on 13,717 animals with carcass phenotypes and imputed sequence genotypes that were split in an independent GWAS discovery set of varying size and a remaining set for validation of prediction. Expression data were used from a Hanwoo gene expression experiment based on 45 animals. Results Using a larger number of animals in the reference set increased the accuracy of genomic prediction whereas a larger independent GWAS discovery dataset improved identification of predictive SNPs. Using pre-selected SNPs from GWAS in GBLUP improved accuracy of prediction by 0.02 for EMA and up to 0.05 for BFT, CWT, and MS, compared to a 50 k standard SNP array that gave accuracies of 0.50, 0.47, 0.58, and 0.47, respectively. Accuracy of prediction of BFT and CWT increased when BayesR was applied with the 50 k SNP array (0.02 and 0.03, respectively) and was further improved by combining the 50 k array with the top-SNPs (0.06 and 0.04, respectively). By contrast, using BayesR resulted in limited improvement for EMA and MS. wGBLUP did not improve accuracy but increased prediction bias. Based on the RNA-seq experiment, we identified informative expression quantitative trait loci, which, when used in GBLUP, improved the accuracy of prediction slightly, i.e. between 0.01 and 0.02. SNPs that were located in genes, the expression of which was associated with differences in trait phenotype, did not contribute to a higher prediction accuracy. Conclusions Our results show that, in Hanwoo beef cattle, when SNPs are pre-selected from GWAS on imputed sequence data, the accuracy of prediction improves only slightly whereas the contribution of SNPs that are selected based on gene expression is not significant. The benefit of statistical models to prioritize selected SNPs for estimating genomic breeding values is trait-specific and depends on the genetic architecture of each trait.
Reference populations for genomic selection usually involve selected individuals, which may result in biased prediction of estimated genomic breeding values (GEBV). In a simulation study, bias and accuracy of GEBV were explored for various genetic models with individuals selectively genotyped in a typical nucleus breeding program. We compared the performance of three existing methods, that is, Best Linear Unbiased Prediction of breeding values using pedigree-based relationships (PBLUP), genomic relationships for genotyped animals only (GBLUP) and a Single-Step approach (SSGBLUP) using both. For a scenario with no-selection and random mating (RR), prediction was unbiased. However, lower accuracy and bias were observed for scenarios with selection and random mating (SR) or selection and positive assortative mating (SA). As expected, bias disappeared when all individuals were genotyped and used in GBLUP. SSGBLUP showed higher accuracy compared to GBLUP, and bias of prediction was negligible with SR. However, PBLUP and SSGBLUP still showed bias in SA due to high inbreeding. SSGBLUP and PBLUP were unbiased provided that inbreeding was accounted for in the relationship matrices. Selective genotyping based on extreme phenotypic contrasts increased the prediction accuracy, but prediction was biased when using GBLUP. SSGBLUP could correct the biasedness while gaining higher accuracy than GBLUP. In a typical animal breeding program, where it is too expensive to genotype all animals, it would be appropriate to genotype phenotypically contrasting selection candidates and use a Single-Step approach to obtain accurate and unbiased prediction of GEBV. K E Y W O R D Sgenomic selection, GWAS, prediction bias, selective genotyping, single-step GBLUP How to cite this article: Gowane GR, Lee SH, Clark S, Moghaddar N, Al-Mamun HA, van der Werf JHJ. Effect of selection and selective genotyping for creation of reference on bias and accuracy of genomic prediction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.