Advancements in next-generation sequencing technology have enabled whole genome re-sequencing in many species providing unprecedented discovery and characterization of molecular polymorphisms. There are limitations, however, to next-generation sequencing approaches for species with large complex genomes such as barley and wheat. Genotyping-by-sequencing (GBS) has been developed as a tool for association studies and genomics-assisted breeding in a range of species including those with complex genomes. GBS uses restriction enzymes for targeted complexity reduction followed by multiplex sequencing to produce high-quality polymorphism data at a relatively low per sample cost. Here we present a GBS approach for species that currently lack a reference genome sequence. We developed a novel two-enzyme GBS protocol and genotyped bi-parental barley and wheat populations to develop a genetically anchored reference map of identified SNPs and tags. We were able to map over 34,000 SNPs and 240,000 tags onto the Oregon Wolfe Barley reference map, and 20,000 SNPs and 367,000 tags on the Synthetic W9784×Opata85 (SynOpDH) wheat reference map. To further evaluate GBS in wheat, we also constructed a de novo genetic map using only SNP markers from the GBS data. The GBS approach presented here provides a powerful method of developing high-density markers in species without a sequenced genome while providing valuable tools for anchoring and ordering physical maps and whole-genome shotgun sequence. Development of the sequenced reference genome(s) will in turn increase the utility of GBS data enabling physical mapping of genes and haplotype imputation of missing data. Finally, as a result of low per-sample costs, GBS will have broad application in genomics-assisted plant breeding programs.
Despite important strides in marker technologies, the use of marker‐assisted selection has stagnated for the improvement of quantitative traits. Biparental mating designs for the detection of loci affecting these traits (quantitative trait loci [QTL]) impede their application, and the statistical methods used are ill‐suited to the traits' polygenic nature. Genomic selection (GS) has been proposed to address these deficiencies. Genomic selection predicts the breeding values of lines in a population by analyzing their phenotypes and high‐density marker scores. A key to the success of GS is that it incorporates all marker information in the prediction model, thereby avoiding biased marker effect estimates and capturing more of the variation due to small‐effect QTL. In simulations, the correlation between true breeding value and the genomic estimated breeding value has reached levels of 0.85 even for polygenic low heritability traits. This level of accuracy is sufficient to consider selecting for agronomic performance using marker information alone. Such selection would substantially accelerate the breeding cycle, enhancing gains per unit time. It would dramatically change the role of phenotyping, which would then serve to update prediction models and no longer to select lines. While research to date shows the exceptional promise of GS, work remains to be done to validate it empirically and to incorporate it into breeding schemes.
We intuitively believe that the dramatic drop in the cost of DNA marker information we have experienced should have immediate benefits in accelerating the delivery of crop varieties with improved yield, quality and biotic and abiotic stress tolerance. But these traits are complex and affected by many genes, each with small effect. Traditional marker-assisted selection has been ineffective for such traits. The introduction of genomic selection (GS), however, has shifted that paradigm. Rather than seeking to identify individual loci significantly associated with a trait, GS uses all marker data as predictors of performance and consequently delivers more accurate predictions. Selection can be based on GS predictions, potentially leading to more rapid and lower cost gains from breeding. The objectives of this article are to review essential aspects of GS and summarize the important take-home messages from recent theoretical, simulation and empirical studies. We then look forward and consider research needs surrounding methodological questions and the implications of GS for long-term selection.
Simulation and empirical studies of genomic selection (GS) show accuracies sufficient to generate rapid genetic gains. However, with the increased popularity of GS approaches, numerous models have been proposed and no comparative analysis is available to identify the most promising ones. Using eight wheat {Triti-cum aestivum L.), barley {Hordeum vulgäre L.), Arabidopsis thaliana (L.) Heynh., and maize {Zea mays L.) datasets, the predictive ability of currently available GS models along with several machine learning methods was evaluated by comparing accuracies, the genomic estimated breeding values (GEBVs), and the marker effects for each model. While a similar level of accuracy was observed for many models, the level of overfitting varied widely as did the computation time and the distribution of marker effect estimates. Our comparisons suggested that GS in plant breeding programs could be based on a reduced set of models such as the Bayesian Lasso, weighted Bayesian shrinkage regression (wBSR, a fast version of BayesB), and random forest (RF) (a machine learning method that could capture nonadditive effects). Linear combinations of different models were tested as well as bagging and boosting methods, but they did not improve accuracy. This study also showed large differences in accuracy between subpopulations within a dataset that could not always be explained by differences in phenotypic variance and size. The broad diversity of empirical datasets tested here adds evidence that GS could increase genetic gain per unit of time and cost.
Genomic selection (GS) uses genomewide molecular markers to predict breeding values and make selections of individuals or breeding lines prior to phenotyping. Here we show that genotyping-by-sequencing (GBS) can be used for de novo genotyping of breeding panels and to develop accurate GS models, even for the large, complex, and polyploid wheat (Triticum aestivum L.) genome. With GBS we discovered 41,371 single nucleotide polymorphisms (SNPs) in a set of 254 advanced breeding lines from CIMMYT's semiarid wheat breeding program. Four different methods were evaluated for imputing missing marker scores in this set of unmapped markers, including random forest regression and a newly developed multivariate-normal expectation-maximization algorithm, which gave more accurate imputation than heterozygous or mean imputation at the marker level, although no signifi cant differences were observed in the accuracy of genomic-estimated breeding values (GEBVs) among imputation methods. Genomic-estimated breeding value prediction accuracies with GBS were 0.28 to 0.45 for grain yield, an improvement of 0.1 to 0.2 over an established marker platform for wheat. Genotyping-bysequencing combines marker discovery and genotyping of large populations, making it an excellent marker platform for breeding applications even in the absence of a reference genome sequence or previous polymorphism discovery. In addition, the fl exibility and low cost of GBS make this an ideal approach for genomics-assisted breeding.
Genetic correlations between quantitative traits measured in many breeding programs are pervasive. These correlations indicate that measurements of one trait carry information on other traits. Current single-trait (univariate) genomic selection does not take advantage of this information. Multivariate genomic selection on multiple traits could accomplish this but has been little explored and tested in practical breeding programs. In this study, three multivariate linear models (i.e., GBLUP, BayesA, and BayesCp) were presented and compared to univariate models using simulated and real quantitative traits controlled by different genetic architectures. We also extended BayesA with fixed hyperparameters to a full hierarchical model that estimated hyperparameters and BayesCp to impute missing phenotypes. We found that optimal marker-effect variance priors depended on the genetic architecture of the trait so that estimating them was beneficial. We showed that the prediction accuracy for a low-heritability trait could be significantly increased by multivariate genomic selection when a correlated high-heritability trait was available. Further, multiple-trait genomic selection had higher prediction accuracy than single-trait genomic selection when phenotypes are not available on all individuals and traits. Additional factors affecting the performance of multiple-trait genomic selection were explored.T HE principle of genomic selection is to estimate simultaneously the effect of all markers in a training population consisting of phenotyped and genotyped individuals (Meuwissen et al. 2001). Genomic estimated breeding values (GEBVs) are then calculated as the sum of estimated marker effects for genotyped individuals in a prediction population. Fitting all markers simultaneously ensures that marker-effect estimates are unbiased, small effects are captured, and there is no multiple testing.Current genomic prediction models usually use only a single phenotypic trait. However, new varieties of crops and animals are evaluated for their performance on multiple traits. Crop breeders record phenotypic data for multiple traits in categories such as yield components (e.g., grain weight or biomass), grain quality (e.g., taste, shape, color, nutrient content), and resistance to biotic or abiotic stress. To take advantage of genetic correlation in mapping causal loci, multi-trait QTL mapping methods have been developed using maximum-likelihood (Jiang and Zeng 1995) and Bayesian (Banerjee et al. 2008; Xu et al. 2009) methods. Calus andVeerkamp (2011) recently presented three multiple-trait genomic selection (MT-GS) models: ridge regression (GBLUP), BayesSSVS, and BayesCp. The authors ranked the performances of these MT-GS methods (BayesSSVS . BayesCp . GBLUP) based on simulated traits under a single genetic architecture. Genetic correlation was shown to be a key factor determining the MT-GS advantage over single-trait genomic selection (ST-GS). A few issues for these MT-GS methods still need attention. First, genetic architectu...
Advancements in genotyping are rapidly decreasing marker costs and increasing genome coverage. This is facilitating the use of marker‐assisted selection (MAS) in plant breeding. Commonly employed MAS strategies, however, are not well suited for agronomically important complex traits, requiring extra time for field‐based phenotyping to identify agronomically superior lines. Genomic selection (GS) is an emerging alternative to MAS that uses all marker information to calculate genomic estimated breeding values (GEBVs) for complex traits. Selections are made directly on GEBV without further phenotyping. We developed an analytical framework to (i) compare gains from MAS and GS for complex traits and (ii) provide a plant breeding context for interpreting results from studies on GEBV accuracy. We designed MAS and GS breeding strategies with equal budgets for a high‐investment maize (Zea mays L.) program and a low‐investment winter wheat (Triticum aestivum L.) program. Results indicate that GS can outperform MAS on a per‐year basis even at low GEBV accuracies. Using a previously reported GEBV accuracy of 0.53 for net merit in dairy cattle, expected annual gain from GS exceeded that of MAS by about threefold for maize and twofold for winter wheat. We conclude that if moderate selection accuracies can be achieved, GS could dramatically accelerate genetic gain through its shorter breeding cycle.
Genomic selection (GS) models use genome-wide genetic information to predict genetic values of candidates of selection. Originally, these models were developed without considering genotype × environment interaction(G×E). Several authors have proposed extensions of the single-environment GS model that accommodate G×E using either covariance functions or environmental covariates. In this study, we model G×E using a marker × environment interaction (M×E) GS model; the approach is conceptually simple and can be implemented with existing GS software. We discuss how the model can be implemented by using an explicit regression of phenotypes on markers or using co-variance structures (a genomic best linear unbiased prediction-type model). We used the M×E model to analyze three CIMMYT wheat data sets (W1, W2, and W3), where more than 1000 lines were genotyped using genotyping-by-sequencing and evaluated at CIMMYT’s research station in Ciudad Obregon, Mexico, under simulated environmental conditions that covered different irrigation levels, sowing dates and planting systems. We compared the M×E model with a stratified (i.e., within-environment) analysis and with a standard (across-environment) GS model that assumes that effects are constant across environments (i.e., ignoring G×E). The prediction accuracy of the M×E model was substantially greater of that of an across-environment analysis that ignores G×E. Depending on the prediction problem, the M×E model had either similar or greater levels of prediction accuracy than the stratified analyses. The M×E model decomposes marker effects and genomic values into components that are stable across environments (main effects) and others that are environment-specific (interactions). Therefore, in principle, the interaction model could shed light over which variants have effects that are stable across environments and which ones are responsible for G×E. The data set and the scripts required to reproduce the analysis are publicly available as Supporting Information.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.