BackgroundThe incorporation of genomic coefficients into the numerator relationship matrix allows estimation of breeding values using all phenotypic, pedigree and genomic information simultaneously. In such a single-step procedure, genomic and pedigree-based relationships have to be compatible. As there are many options to create genomic relationships, there is a question of which is optimal and what the effects of deviations from optimality are.MethodsData of litter size (total number born per litter) for 338,346 sows were analyzed. Illumina PorcineSNP60 BeadChip genotypes were available for 1,989. Analyses were carried out with the complete data set and with a subset of genotyped animals and three generations pedigree (5,090 animals). A single-trait animal model was used to estimate variance components and breeding values. Genomic relationship matrices were constructed using allele frequencies equal to 0.5 (G05), equal to the average minor allele frequency (GMF), or equal to observed frequencies (GOF). A genomic matrix considering random ascertainment of allele frequencies was also used (GOF*). A normalized matrix (GN) was obtained to have average diagonal coefficients equal to 1. The genomic matrices were combined with the numerator relationship matrix creating H matrices.ResultsIn G05 and GMF, both diagonal and off-diagonal elements were on average greater than the pedigree-based coefficients. In GOF and GOF*, the average diagonal elements were smaller than pedigree-based coefficients. The mean of off-diagonal coefficients was zero in GOF and GOF*. Choices of G with average diagonal coefficients different from 1 led to greater estimates of additive variance in the smaller data set. The correlation between EBV and genomic EBV (n = 1,989) were: 0.79 using G05, 0.79 using GMF, 0.78 using GOF, 0.79 using GOF*, and 0.78 using GN. Accuracies calculated by inversion increased with all genomic matrices. The accuracies of genomic-assisted EBV were inflated in all cases except when GN was used.ConclusionsParameter estimates may be biased if the genomic relationship coefficients are in a different scale than pedigree-based coefficients. A reasonable scaling may be obtained by using observed allele frequencies and re-scaling the genomic relationship matrix to obtain average diagonal elements of 1.
Although common datasets are an important resource for the scientific community and can be used to address important questions, genomic datasets of a meaningful size have not generally been available in livestock species. We describe a pig dataset that PIC (a Genus company) has made available for comparing genomic prediction methods. We also describe genomic evaluation of the data using methods that PIC considers best practice for predicting and validating genomic breeding values, and we discuss the impact of data structure on accuracy. The dataset contains 3534 individuals with high-density genotypes, phenotypes, and estimated breeding values for five traits. Genomic breeding values were calculated using BayesB, with phenotypes and de-regressed breeding values, and using a single-step genomic BLUP approach that combines information from genotyped and un-genotyped animals. The genomic breeding value accuracy increased with increased trait heritability and with increased relationship between training and validation. In nearly all cases, BayesB using de-regressed breeding values outperformed the other approaches, but the single-step evaluation performed only slightly worse. This dataset was useful for comparing methods for genomic prediction using real data. Our results indicate that validation approaches accounting for relatedness between populations can correct for potential overestimation of genomic breeding value accuracies, with implications for genotyping strategies to carry out genomic selection programs.
BackgroundGenomic selection has gained much attention and the main goal is to increase the predictive accuracy and the genetic gain in livestock using dense marker information. Most methods dealing with the large p (number of covariates) small n (number of observations) problem have dealt only with continuous traits, but there are many important traits in livestock that are recorded in a discrete fashion (e.g. pregnancy outcome, disease resistance). It is necessary to evaluate alternatives to analyze discrete traits in a genome-wide prediction context.MethodsThis study shows two threshold versions of Bayesian regressions (Bayes A and Bayesian LASSO) and two machine learning algorithms (boosting and random forest) to analyze discrete traits in a genome-wide prediction context. These methods were evaluated using simulated and field data to predict yet-to-be observed records. Performances were compared based on the models' predictive ability.ResultsThe simulation showed that machine learning had some advantages over Bayesian regressions when a small number of QTL regulated the trait under pure additivity. However, differences were small and disappeared with a large number of QTL. Bayesian threshold LASSO and boosting achieved the highest accuracies, whereas Random Forest presented the highest classification performance. Random Forest was the most consistent method in detecting resistant and susceptible animals, phi correlation was up to 81% greater than Bayesian regressions. Random Forest outperformed other methods in correctly classifying resistant and susceptible animals in the two pure swine lines evaluated. Boosting and Bayes A were more accurate with crossbred data.ConclusionsThe results of this study suggest that the best method for genome-wide prediction may depend on the genetic basis of the population analyzed. All methods were less accurate at correctly classifying intermediate animals than extreme animals. Among the different alternatives proposed to analyze discrete traits, machine-learning showed some advantages over Bayesian regressions. Boosting with a pseudo Huber loss function showed high accuracy, whereas Random Forest produced more consistent results and an interesting predictive ability. Nonetheless, the best method may be case-dependent and a initial evaluation of different methods is recommended to deal with a particular problem.
The objective of this study was to estimate (co)variance functions using random regression models on Legendre polynomials for the analysis of repeated measures of BW from birth to adult age. A total of 82,064 records from 8,145 females were analyzed. Different models were compared. The models included additive direct and maternal effects, and animal and maternal permanent environmental effects as random terms. Contemporary group and dam age at calving (linear and quadratic effect) were included as fixed effects, and orthogonal Legendre polynomials of animal age (cubic regression) were considered as random covariables. Eight models with polynomials of third to sixth order were used to describe additive direct and maternal effects, and animal and maternal permanent environmental effects. Residual effects were modeled using 1 (i.e., assuming homogeneity of variances across all ages) or 5 age classes. The model with 5 classes was the best to describe the trajectory of residuals along the growth curve. The model including fourth- and sixth-order polynomials for additive direct and animal permanent environmental effects, respectively, and third-order polynomials for maternal genetic and maternal permanent environmental effects were the best. Estimates of (co)variance obtained with the multi-trait and random regression models were similar. Direct heritability estimates obtained with the random regression models followed a trend similar to that obtained with the multi-trait model. The largest estimates of maternal heritability were those of BW taken close to 240 d of age. In general, estimates of correlation between BW from birth to 8 yr of age decreased with increasing distance between ages.
Data sets of US Holsteins, Israeli Holsteins, and pigs from PIC (a Genus company, Hendersonville, TN) were used to evaluate the effect of different numbers of generations on ability to predict genomic breeding values of young genotyped animals. The influence of including only 2 generations of ancestors (A2) or all ancestors (Af) was also investigated. A total of 34,506 US Holsteins, 1,305 Israeli Holsteins, and 5,236 pigs were genotyped. The evaluations were computed by traditional BLUP and single-step genomic BLUP, and computing performance was assessed for the latter method. For the 2 Holstein data sets, coefficients of determination (R(2)) and regression (δ) of deregressed evaluations from a full data set with records up to 2011 on estimated breeding values and genomic estimated breeding values from the truncated data sets were computed. The thresholds for data deletion were set by intervals of 5 yr, based on the average generation interval in dairy cattle. For the PIC data set, correlations between corrected phenotypes and estimated or genomic estimated breeding values were used to evaluate predictive ability on young animals born in 2010 and 2011. The reduced data set contained data up to 2009, and the thresholds were set based on an average generation interval of 3 yr. The number of generations that could be deleted without a reduction in accuracy depended on data structure and trait. For US Holsteins, removing 3 and 4 generations of data did not reduce accuracy of evaluations for final score in Af and A2 scenarios, respectively. For Israeli Holsteins, the accuracies for milk, fat, and protein yields were the highest when only phenotypes recorded in 2000 and later were included and full pedigrees were applied. Of the 135 Israeli bulls with genotypes (validation set) and daughter records only in the complete data set, 38 and 97 were sons of Israeli and foreign bulls, respectively. Although more phenotypic data increased the prediction accuracy for sons of Israeli bulls, the reverse was true for sons of foreign bulls. Also, more phenotypic data caused large inflation of genomic estimated breeding values for sons of foreign bulls, whereas the opposite was true with the deletion of all but the most recent phenotypic data. Results for protein and fat percentage were different from those for milk, fat, and protein yields; however, relatively, the changes in coefficients of determination and regression were smaller for percentage traits. For PIC data set, removing data from up to 5 generations did not erode predictive ability for genotyped animals for the 2 reproductive traits used in validation. Given the data used in this study, truncating old data reduces computation requirements but does not decrease the accuracy. For small populations that include local and imported animals, truncation may be beneficial for one group of animals and detrimental to another group.
Data comprising 53,181 calving records were analyzed to estimate the genetic correlation between days to calving (DC), and days to first calving (DFC), and the following traits: scrotal circumference (SC), age at first calving (AFC), and weight adjusted for 550 d of age (W550) in a Nelore herd. (Co)variance components were estimated using the REML method fitting bivariate animal models. The fixed effects considered for DC were contemporary group, month of last calving, and age at breeding season (linear and quadratic effects). Contemporary groups were composed by herd, year, season, and management group at birth; herd and management group at weaning; herd, season, and management group at mating; and sex of calf and mating type (multiple sires, single sire, or AI). In DFC analysis, the same fixed effects were considered excluding the month of last calving. For DC, a repeatability animal model was applied. Noncalvers were not considered in analyses because an attempt to include them, attributing a penalty, did not improve the identification of genetic differences between animals. Heritability estimates ranged from 0.04 to 0.06 for DC, from 0.06 to 0.13 for DFC, from 0.42 to 0.44 for SC, from 0.06 to 0.08 for AFC, and was 0.30 for W550. The genetic correlation estimated between DC and SC was low and negative (-0.10), between DC and AFC was high and positive (0.76), and between DC and W550 was almost null (0.07). Similar results were found for genetic correlation estimates between DFC and SC (-0.14), AFC (0.94), and W550 (-0.02). The genetic correlation estimates indicate that the use of DC in the selection of beef cattle may promote favorable correlated responses to age at first mating and, consequently, higher gains in sexual precocity can be expected.
This work aims to compare different nonlinear functions for describing the growth curves of Nelore females. The growth curve parameters, their (co)variance components, and environmental and genetic effects were estimated jointly through a Bayesian hierarchical model. In the first stage of the hierarchy, 4 nonlinear functions were compared: Brody, Von Bertalanffy, Gompertz, and logistic. The analyses were carried out using 3 different data sets to check goodness of fit while having animals with few records. Three different assumptions about SD of fitting errors were considered: constancy throughout the trajectory, linear increasing until 3 yr of age and constancy thereafter, and variation following the nonlinear function applied in the first stage of the hierarchy. Comparisons of the overall goodness of fit were based on Akaike information criterion, the Bayesian information criterion, and the deviance information criterion. Goodness of fit at different points of the growth curve was compared applying the Gelfand's check function. The posterior means of adult BW ranged from 531.78 to 586.89 kg. Greater estimates of adult BW were observed when the fitting error variance was considered constant along the trajectory. The models were not suitable to describe the SD of fitting errors at the beginning of the growth curve. All functions provided less accurate predictions at the beginning of growth, and predictions were more accurate after 48 mo of age. The prediction of adult BW using nonlinear functions can be accurate when growth curve parameters and their (co)variance components are estimated jointly. The hierarchical model used in the present study can be applied to the prediction of mature BW in herds in which a portion of the animals are culled before adult age. Gompertz, Von Bertalanffy, and Brody functions were adequate to establish mean growth patterns and to predict the adult BW of Nelore females. The Brody model was more accurate in predicting the birth weight of these animals and presented the best overall goodness of fit.
Estimativas de herdabilidade e tendências genéticas para característicasde crescimento e reprodutivas em bovinos da raça Nelore 0.171 (0.01); 0.219 (0.02); 0.186 (0.03); and 0.224 (0.02) kg per year, for WW, PW, GBW, and GWP, respectively, corresponding to increases of 0.10, 0.08,
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.