Many genome variants shaping mammalian phenotype are hypothesized to regulate gene transcription and/or to be under selection. However, most of the evidence to support this hypothesis comes from human studies. Systematic evidence for regulatory and evolutionary signals contributing to complex traits in a different mammalian model is needed. Sequence variants associated with gene expression (expression quantitative trait loci [eQTLs]) and concentration of metabolites (metabolic quantitative trait loci [mQTLs]) and under histone-modification marks in several tissues were discovered from multiomics data of over 400 cattle. Variants under selection and evolutionary constraint were identified using genome databases of multiple species. These analyses defined 30 sets of variants, and for each set, we estimated the genetic variance the set explained across 34 complex traits in 11,923 bulls and 32,347 cows with 17,669,372 imputed variants. The per-variant trait heritability of these sets across traits was highly consistent (r > 0.94) between bulls and cows. Based on the per-variant heritability, conserved sites across 100 vertebrate species and mQTLs ranked the highest, followed by eQTLs, young variants, those under histone-modification marks, and selection signatures. From these results, we defined a Functional-And-Evolutionary Trait Heritability (FAETH) score indicating the functionality and predicted heritability of each variant. In additional 7,551 cattle, the high FAETH-ranking variants had significantly increased genetic variances and genomic prediction accuracies in 3 production traits compared to the low FAETH-ranking variants. The FAETH framework combines the information of gene regulation, evolution, and trait heritability to rank variants, and the publicly available FAETH data provide a set of biological priors for cattle genomic selection worldwide.
BackgroundSequence data can potentially increase the reliability of genomic predictions, because such data include causative mutations instead of relying on linkage disequilibrium (LD) between causative mutations and prediction variants. However, the location of the causative mutations is not known, and the presence of many variants that are in low LD with the causative mutations may reduce prediction reliability. Our objective was to investigate whether the use of variants at quantitative trait loci (QTL) that are identified in a multi-breed genome-wide association study (GWAS) for milk, fat and protein yield would increase the reliability of within- and multi-breed genomic predictions in Holstein, Jersey and Danish Red cattle. A wide range of scenarios that test different strategies to select prediction markers, for both within-breed and multi-breed prediction, were compared.ResultsFor all breeds and traits, the use of variants selected from a multi-breed GWAS resulted in substantial increases in prediction reliabilities compared to within-breed prediction using a 50 K SNP array. Reliabilities depended highly on the choice of the prediction markers, and the scenario that led to the highest reliability varied between breeds and traits. While genomic correlations across breeds were low for genome-wide sequence variants, the effects of the QTL variants that yielded the highest reliabilities were highly correlated across breeds.ConclusionsOur results show that the use of sequence variants, which are located near peaks of QTL that are detected in a multi-breed GWAS, can increase reliability of genomic predictions.Electronic supplementary materialThe online version of this article (doi:10.1186/s12711-016-0259-0) contains supplementary material, which is available to authorized users.
Sequence data are expected to increase the reliability of genomic prediction by containing causative mutations directly, especially in cases where low linkage disequilibrium between markers and causative mutations limits prediction reliability, such as across-breed prediction in dairy cattle. In practice, the causative mutations are unknown, and prediction with only variants in perfect linkage disequilibrium with the causative mutations is not realistic, leading to a reduced reliability compared to knowing the causative variants. Our objective was to use sequence data to investigate the potential benefits of sequence data for the prediction of genomic relationships, and consequently reliability of genomic breeding values. We used sequence data from five dairy cattle breeds, and a larger number of imputed sequences for two of the five breeds. We focused on the influence of linkage disequilibrium between markers and causative mutations, and assumed that a fraction of the causative mutations was shared across breeds and had the same effect across breeds. By comparing the loss in reliability of different scenarios, varying the distance between markers and causative mutations, using either all genome wide markers from commercial SNP chips, or only the markers closest to the causative mutations, we demonstrate the importance of using only variants very close to the causative mutations, especially for across-breed prediction. Rare variants improved prediction only if they were very close to rare causative mutations, and all causative mutations were rare. Our results show that sequence data can potentially improve genomic prediction, but careful selection of markers is essential.
Genomic prediction is widely used to select candidates for breeding. Size and composition of the reference population are important factors influencing prediction accuracy. In Holstein dairy cattle, large reference populations are used, but this is difficult to achieve in numerically small breeds and for traits that are not routinely recorded. The prediction accuracy is usually estimated using cross-validation, requiring the full data set. It would be useful to have a method to predict the benefit of multibreed reference populations that does not require the availability of the full data set. Our objective was to study the effect of the size and breed composition of the reference population on the accuracy of genomic prediction using genomic BLUP and Bayes R. We also examined the effect of trait heritability and validation breed on prediction accuracy. Using these empirical results, we investigated the use of a formula to predict the effect of the size and composition of the reference population on the accuracy of genomic prediction. Phenotypes were simulated in a data set containing real genotypes of imputed sequence variants for 22,752 dairy bulls and cows, including Holstein, Jersey, Red Holstein, and Australian Red cattle. Different reference populations were constructed, varying in size and composition, to study within-breed, multibreed, and across-breed prediction. Phenotypes were simulated varying in heritability, number of chromosomes, and number of quantitative trait loci. Genomic prediction was carried out using genomic BLUP and Bayes R. We used either the genomic relationship matrix (GRM) to estimate the number of independent chromosomal segments and subsequently to predict accuracy, or the accuracies obtained from single-breed reference populations to predict the accuracies of larger or multibreed reference populations. Using the GRM overestimated the accuracy; this overestimation was likely due to close relationships among some of the reference animals. Consequently, the GRM could not be used to predict the accuracy of genomic prediction reliably. However, a method using the prediction accuracies obtained by cross-validation using a small, single-breed reference population predicted the accuracy using a multibreed reference population well and slightly overestimated the accuracy for a larger reference population of the same breed, but gave a reasonably close estimate of the accuracy for a multibreed reference population. This method could be useful for making decisions regarding the size and composition of the reference population.
Methane is a greenhouse gas of high interest to the dairy industry, with 57% of Australia's dairy emissions attributed to enteric methane. Enteric methane emissions also constitute a loss of approximately 6.5% of ingested energy. Genetic selection offers a unique mitigation strategy to decrease the methane emissions of dairy cattle, while simultaneously improving their energy efficiency. Breeding objectives should focus on improving the overall sustainability of dairy cattle by reducing methane emissions without negatively affecting important economic traits. Common definitions for methane production, methane yield, and methane intensity are widely accepted, but there is not yet consensus for the most appropriate method to calculate residual methane production, as the different methods have not been compared. In this study, we examined 9 definitions of residual methane production. Records of individual cow methane, dry matter intake (DMI), and energy corrected milk (ECM) were obtained from 379 animals and measured over a 5-d period from 12 batches across 5 yr using the SF 6 tracer method and an electronic feed recording system, respectively. The 9 methods of calculating residual methane involved genetic and phenotypic regression of methane production on a combination of DMI and ECM corrected for days in milk, parity, and experimental batch using phenotypes or direct genomic values. As direct genomic values (DGV) for DMI are not routinely evaluated in Australia at this time, DGV for FeedSaved, which is derived from DGV for residual feed intake and estimated breeding value for bodyweight, were used. Heritability estimates were calculated using univariate models, and correla-tions were estimated using bivariate models corrected for the fixed effects of year-batch, days in milk, and lactation number, and fitted using a genomic relationship matrix. Residual methane production candidate traits had low to moderate heritability (0.10 ± 0.09 to 0.21 ± 0.10), with residual methane production corrected for ECM being the highest. All definitions of residual methane were highly correlated phenotypically (>0.87) and genetically (>0.79) with one another and moderately to highly with other methane candidate traits (>0.59), with high standard errors. The results suggest that direct selection for a residual methane production trait would result in indirect, favorable improvement in all other methane traits. The high standard errors highlight the importance of expanding data sets by measuring more animals for their methane emissions and DMI, or through exploration of proxy traits and combining data via international collaboration.
BackgroundThe increasing availability of whole-genome sequence data is expected to increase the accuracy of genomic prediction. However, results from simulation studies and analysis of real data do not always show an increase in accuracy from sequence data compared to high-density (HD) single nucleotide polymorphism (SNP) chip genotypes. In addition, the sheer number of variants makes analysis of all variants and accurate estimation of all effects computationally challenging. Our objective was to find a strategy to approximate the analysis of whole-sequence data with a Bayesian variable selection model. Using a simulated dataset, we applied a Bayes R hybrid model to analyse whole-sequence data, test the effect of dropping a proportion of variants during the analysis, and test how the analysis can be split into separate analyses per chromosome to reduce the elapsed computing time. We also investigated the effect of imputation errors on prediction accuracy. Subsequently, we applied the approach to a dataset that contained imputed sequences and records for production and fertility traits for 38,492 Holstein, Jersey, Australian Red and crossbred bulls and cows.ResultsWith the simulated dataset, we found that prediction accuracy was highly increased for a breed that was not represented in the training population for sequence data compared to HD SNP data. Either dropping part of the variants during the analysis or splitting the analysis into separate analyses per chromosome decreased accuracy compared to analysing whole-sequence data. First, dropping variants from each chromosome and reanalysing the retained variants together resulted in an accuracy similar to that obtained when analysing whole-sequence data. Adding imputation errors decreased prediction accuracy, especially for errors in the validation population. With real data, using sequence variants resulted in accuracies that were similar to those obtained with the HD SNPs.ConclusionsWe present an efficient approach to approximate analysis of whole-sequence data with a Bayesian variable selection model. The lack of increase in prediction accuracy when applied to real data could be due to imputation errors, which demonstrates the importance of developing more accurate methods of imputation or directly genotyping sequence variants that have a major effect in the prediction equation.Electronic supplementary materialThe online version of this article (doi:10.1186/s12711-017-0347-9) contains supplementary material, which is available to authorized users.
The objective of this study was to compare mapping precision and power of within-breed and multibreed genome-wide association studies (GWAS) and to compare the results obtained by the multibreed GWAS with 3 meta-analysis methods. The multibreed GWAS was expected to improve mapping precision compared with a within-breed GWAS because linkage disequilibrium is conserved over shorter distances across breeds than within breeds. The multibreed GWAS was also expected to increase detection power for quantitative trait loci (QTL) segregating across breeds. GWAS were performed for production traits in dairy cattle, using imputed full genome sequences of 16,031 bulls, originating from 6 French and Danish dairy cattle populations. Our results show that a multibreed GWAS can be a valuable tool for the detection and fine mapping of quantitative trait loci. The number of QTL detected with the multibreed GWAS was larger than the number detected by the within-breed GWAS, indicating an increase in power, especially when the 2 Holstein populations were combined. The largest number of QTL was detected when all populations were combined. The analysis combining all breeds was, however, dominated by Holstein, and QTL segregating in other breeds but not in Holstein were sometimes overshadowed by larger QTL segregating in Holstein. Therefore, the GWAS combining all breeds except Holstein was useful to detect such peaks. Combining all breeds except Holstein resulted in smaller QTL intervals on average, but this outcome was not the case when the Holstein populations were included in the analysis. Although no decrease in the average QTL size was observed, mapping precision did improve for several QTL. Out of 3 different multibreed meta-analysis methods, the weighted z-scores model resulted in the most similar results to the full multibreed GWAS and can be useful as an alternative to a full multibreed GWAS. Differences between the multibreed GWAS and the meta-analyses were larger when different breeds were combined than when the 2 Holstein populations were combined.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.