We intuitively believe that the dramatic drop in the cost of DNA marker information we have experienced should have immediate benefits in accelerating the delivery of crop varieties with improved yield, quality and biotic and abiotic stress tolerance. But these traits are complex and affected by many genes, each with small effect. Traditional marker-assisted selection has been ineffective for such traits. The introduction of genomic selection (GS), however, has shifted that paradigm. Rather than seeking to identify individual loci significantly associated with a trait, GS uses all marker data as predictors of performance and consequently delivers more accurate predictions. Selection can be based on GS predictions, potentially leading to more rapid and lower cost gains from breeding. The objectives of this article are to review essential aspects of GS and summarize the important take-home messages from recent theoretical, simulation and empirical studies. We then look forward and consider research needs surrounding methodological questions and the implications of GS for long-term selection.
Advancements in genotyping are rapidly decreasing marker costs and increasing genome coverage. This is facilitating the use of marker‐assisted selection (MAS) in plant breeding. Commonly employed MAS strategies, however, are not well suited for agronomically important complex traits, requiring extra time for field‐based phenotyping to identify agronomically superior lines. Genomic selection (GS) is an emerging alternative to MAS that uses all marker information to calculate genomic estimated breeding values (GEBVs) for complex traits. Selections are made directly on GEBV without further phenotyping. We developed an analytical framework to (i) compare gains from MAS and GS for complex traits and (ii) provide a plant breeding context for interpreting results from studies on GEBV accuracy. We designed MAS and GS breeding strategies with equal budgets for a high‐investment maize (Zea mays L.) program and a low‐investment winter wheat (Triticum aestivum L.) program. Results indicate that GS can outperform MAS on a per‐year basis even at low GEBV accuracies. Using a previously reported GEBV accuracy of 0.53 for net merit in dairy cattle, expected annual gain from GS exceeded that of MAS by about threefold for maize and twofold for winter wheat. We conclude that if moderate selection accuracies can be achieved, GS could dramatically accelerate genetic gain through its shorter breeding cycle.
Fusarium head blight (FHB) is a devastating disease of barley {Hordeum vulgäre L.), causing reductions in yield and quality. Marker-based selection for resistance to FHB and lowered deoxynivalenol (DON) grain concentration would save considerable costs and time associated with phenotyping. A marker-based selection approach called genomic selection (GS) uses genomewide marker information to predict genetic value. We used a cross-validation approach that separated training sets from validation sets by both entry and environment. We used this framework to test the potential of GS for genetic improvement of FHB and DON as well as test the effect of different factors on prediction accuracy. Prediction accuracy for FHB was found to be as high as 0.72 and that for DON was found to be as high as 0.68. Little difference was found between marker effect estimation methods in terms of prediction of entry genetic value. The extensive linkage disequilibrium (LD) present in this population allowed the marker set to be reduced to 384 markers and training population (TP) size to be reduced 200 with little effect on prediction accuracy. We found little to no advantage to combining subpopulations that correspond to neighboring breeding programs to increase TP size. Apparently, little genetic information is shared between subpopulations, either because of different marker-quantitative trait loci (OTL) linkage phases, different segregating OTL, or nonadditive gene action.
Population structure analyses and genome-wide association studies (GWAS) conducted on crop germplasm collections provide valuable information on the frequency and distribution of alleles governing economically important traits. The value of these analyses is substantially enhanced when the accession numbers can be increased from ~1,000 to ~10,000 or more. In this research, we conducted the first comprehensive analysis of population structure on the collection of 14,000 soybean accessions [Glycine max (L.) Merr. and G. soja Siebold & Zucc.] using a 50K-SNP chip. Accessions originating from Japan were relatively homogenous and distinct from the Korean accessions. As a whole, both Japanese and Korean accessions diverged from the Chinese accessions. The ancestry of founders of the American accessions derived mostly from two Chinese subpopulations, which reflects the composition of the American accessions as a whole. A 12,000 accession GWAS conducted on seed protein and oil is the largest reported to date in plants and identified single nucleotide polymorphisms (SNPs) with strong signals on chromosomes 20 and 15. A chromosome 20 region previously reported to be important for protein and oil content was further narrowed and now contains only three plausible candidate genes. The haplotype effects show a strong negative relationship between oil and protein at this locus, indicating negative pleiotropic effects or multiple closely linked loci in repulsion phase linkage. The vast majority of accessions carry the haplotype allele conferring lower protein and higher oil. Our results provide a fuller understanding of the distribution of genetic variation contained within the USDA soybean collection and how it relates to phenotypic variation for economically important traits.
BackgroundAdvances in genotyping technology, such as genotyping by sequencing (GBS), are making genomic prediction more attractive to reduce breeding cycle times and costs associated with phenotyping. Genomic prediction and selection has been studied in several crop species, but no reports exist in soybean. The objectives of this study were (i) evaluate prospects for genomic selection using GBS in a typical soybean breeding program and (ii) evaluate the effect of GBS marker selection and imputation on genomic prediction accuracy. To achieve these objectives, a set of soybean lines sampled from the University of Nebraska Soybean Breeding Program were genotyped using GBS and evaluated for yield and other agronomic traits at multiple Nebraska locations.ResultsGenotyping by sequencing scored 16,502 single nucleotide polymorphisms (SNPs) with minor-allele frequency (MAF) > 0.05 and percentage of missing values ≤ 5% on 301 elite soybean breeding lines. When SNPs with up to 80% missing values were included, 52,349 SNPs were scored. Prediction accuracy for grain yield, assessed using cross validation, was estimated to be 0.64, indicating good potential for using genomic selection for grain yield in soybean. Filtering SNPs based on missing data percentage had little to no effect on prediction accuracy, especially when random forest imputation was used to impute missing values. The highest accuracies were observed when random forest imputation was used on all SNPs, but differences were not significant. A standard additive G-BLUP model was robust; modeling additive-by-additive epistasis did not provide any improvement in prediction accuracy. The effect of training population size on accuracy began to plateau around 100, but accuracy steadily climbed until the largest possible size was used in this analysis. Including only SNPs with MAF > 0.30 provided higher accuracies when training populations were smaller.ConclusionsUsing GBS for genomic prediction in soybean holds good potential to expedite genetic gain. Our results suggest that standard additive G-BLUP models can be used on unfiltered, imputed GBS data without loss in accuracy.
One of the most important factors affecting genomic prediction accuracy appears to be training population (TP) composition. The objective of this study was to evaluate the effect of genomic relationship on genomic prediction accuracy and determine if adding increasingly unrelated individuals to a TP can reduce prediction accuracy. To accomplish this, a population of barley (Hordeum vulgare L.) lines from the University of Minnesota (lines denoted as MN) and North Dakota State University (lines denoted as ND) breeding programs were used for model training. Predictions were validated using two independent sets of progenies derived from MN × MN crosses and ND × ND crosses. Predictive ability sharply decreased with decreasing relationship between the TP and validation population (VP). More importantly, it was observed that adding increasingly unrelated individuals to the TP can actually reduce predictive ability compared with smaller TPs consisting of highly related individuals only. Reported results are possibly conditional on the relatively low marker density (342 single nucleotide polymorphisms [SNPs]) used. Nevertheless, these findings suggest plant breeding programs desiring to use genomic selection could benefit from focusing on good phenotyping of smaller TPs closely related to the selection candidates rather than developing large and diverse TPs.
Genome-wide association studies (GWAS) may benefit from utilizing haplotype information for making marker-phenotype associations. Several rationales for grouping single nucleotide polymorphisms (SNPs) into haplotype blocks exist, but any advantage may depend on such factors as genetic architecture of traits, patterns of linkage disequilibrium in the study population, and marker density. The objective of this study was to explore the utility of haplotypes for GWAS in barley (Hordeum vulgare) to offer a first detailed look at this approach for identifying agronomically important genes in crops. To accomplish this, we used genotype and phenotype data from the Barley Coordinated Agricultural Project and constructed haplotypes using three different methods. Marker-trait associations were tested by the efficient mixed-model association algorithm (EMMA). When QTL were simulated using single SNPs dropped from the marker dataset, a simple sliding window performed as well or better than single SNPs or the more sophisticated methods of blocking SNPs into haplotypes. Moreover, the haplotype analyses performed better 1) when QTL were simulated as polymorphisms that arose subsequent to marker variants, and 2) in analysis of empirical heading date data. These results demonstrate that the information content of haplotypes is dependent on the particular mutational and recombinational history of the QTL and nearby markers. Analysis of the empirical data also confirmed our intuition that the distribution of QTL alleles in nature is often unlike the distribution of marker variants, and hence utilizing haplotype information could capture associations that would elude single SNPs. We recommend routine use of both single SNP and haplotype markers for GWAS to take advantage of the full information content of the genotype data.
Prediction of single-cross performance has been a major goal of plant breeders since the beginning of hybrid breeding. Recently, genomic prediction has shown to be a promising approach, but only limited studies have examined the accuracy of predicting single-cross performance. Moreover, no studies have examined the potential of predicting single crosses among random inbreds derived from a series of biparental families, which resembles the structure of germplasm comprising the initial stages of a hybrid maize breeding pipeline. The main objectives of this study were to evaluate the potential of genomic prediction for identifying superior single crosses early in the hybrid breeding pipeline and optimize its application. To accomplish these objectives, we designed and analyzed a novel population of single crosses representing the Iowa Stiff Stalk synthetic/non-Stiff Stalk heterotic pattern commonly used in the development of North American commercial maize hybrids. The performance of single crosses was predicted using parental combining ability and covariance among single crosses. Prediction accuracies were estimated using cross-validation and ranged from 0.28 to 0.77 for grain yield, 0.53 to 0.91 for plant height, and 0.49 to 0.94 for staygreen, depending on the number of tested parents of the single cross and genomic prediction method used. The genomic estimated general and specific combining abilities showed an advantage over genomic covariances among single crosses when one or both parents of the single cross were untested. Overall, our results suggest that genomic prediction of single crosses in the early stages of a hybrid breeding pipeline holds great potential to redesign hybrid breeding and increase its efficiency.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.