Key MessageMulti-trait genomic prediction models are useful to allocate available resources in breeding programs by targeted phenotyping of correlated traits when predicting expensive and labor-intensive quality parameters.AbstractMulti-trait genomic prediction models can be used to predict labor-intensive or expensive correlated traits where phenotyping depth of correlated traits could be larger than phenotyping depth of targeted traits, reducing resources and improving prediction accuracy. This is particularly important in the context of allocating phenotyping resource in plant breeding programs. The objective of this work was to evaluate multi-trait models predictive ability with different depth of phenotypic information from correlated traits. We evaluated 495 wheat advanced breeding lines for eight baking quality traits which were genotyped with genotyping-by-sequencing. Through different approaches for cross-validation, we evaluated the predictive ability of a single-trait model and a multi-trait model. Moreover, we evaluated different sizes of the training population (from 50 to 396 individuals) for the trait of interest, different depth of phenotypic information for correlated traits (50 and 100%) and the number of correlated traits to be used (one to three). There was no loss in the predictive ability by reducing the training population up to a 30% (149 individuals) when using correlated traits. A multi-trait model with one highly correlated trait phenotyped for both the training and testing sets was the best model considering phenotyping resources and the gain in predictive ability. The inclusion of correlated traits in the training and testing lines is a strategic approach to replace phenotyping of labor-intensive and high cost traits in a breeding program.Electronic supplementary materialThe online version of this article (10.1007/s00122-018-3186-3) contains supplementary material, which is available to authorized users.
Genomic selection (GS) has successfully been used in plant breeding to improve selection efficiency and reduce breeding time and cost. However, there is not a clear strategy on how to incorporate genotype × environment interaction (GEI) to GS models. Increased prediction accuracy could be achieved using mixed models to exploit GEI by borrowing information from other environments. The objective of this work was to compare strategies to exploit GEI in GS using mixed models. Specifically, we compared strategies to predict new genotypes by borrowing information from other environments modeling the correlation matrix across environments and to design sets of environments aiming for low GEI to predict genomic performance in new environments. We evaluated 1477 advanced wheat (Triticum aestivum L.) lines for yield in 35 location–year combinations genotyped with genotyping‐by‐sequencing (GBS). Mixed models were used to obtain either overall or by‐environment predictions for different sets of environments. Overall accuracy was high (0.5). Borrowing information from relatives evaluated in multiple environments and modeling the correlation matrix across environments was the best strategy to predict new genotypes. On the other hand, the best strategy for predicting the performance of genotypes in new environments was either to predict across locations for single years or to predict within defined mega‐environments (MEs) for any year or location. In summary, higher predictive ability was obtained by characterizing and by modeling GEI in the GS context.
In crop breeding, the interest of predicting the performance of candidate cultivars in the field has increased due to recent advances in molecular breeding technologies. However, the complexity of the wheat genome presents some challenges for applying new technologies in molecular marker identification with next-generation sequencing. We applied genotyping-by-sequencing, a recently developed method to identify single-nucleotide polymorphisms, in the genomes of 384 wheat (Triticum aestivum) genotypes that were field tested under three different water regimes in Mediterranean climatic conditions: rain-fed only, mild water stress, and fully irrigated. We identified 102,324 single-nucleotide polymorphisms in these genotypes, and the phenotypic data were used to train and test genomic selection models intended to predict yield, thousand-kernel weight, number of kernels per spike, and heading date. Phenotypic data showed marked spatial variation. Therefore, different models were tested to correct the trends observed in the field. A mixed-model using moving-means as a covariate was found to best fit the data. When we applied the genomic selection models, the accuracy of predicted traits increased with spatial adjustment. Multiple genomic selection models were tested, and a Gaussian kernel model was determined to give the highest accuracy. The best predictions between environments were obtained when data from different years were used to train the model. Our results confirm that genotyping-by-sequencing is an effective tool to obtain genome-wide information for crops with complex genomes, that these data are efficient for predicting traits, and that correction of spatial variation is a crucial ingredient to increase prediction accuracy in genomic selection models.
The single most important decision in plant breeding programs is the selection of appropriate crosses. The ideal cross would provide superior predicted progeny performance and enough diversity to maintain genetic gain. The aim of this study was to compare the best crosses predicted using combinations of mid-parent value and variance prediction accounting for linkage disequilibrium (V) or assuming linkage equilibrium (V). After predicting the mean and the variance of each cross, we selected crosses based on mid-parent value, the top 10% of the progeny, and weighted mean and variance within progenies for grain yield, grain protein content, mixing time, and loaf volume in two applied wheat ( L.) breeding programs: Instituto Nacional de Investigación Agropecuaria (INIA) Uruguay and CIMMYT Mexico. Although the variance of the progeny is important to increase the chances of finding superior individuals from transgressive segregation, we observed that the mid-parent values of the crosses drove the genetic gain but the variance of the progeny had a small impact on genetic gain for grain yield. However, the relative importance of the variance of the progeny was larger for quality traits. Overall, the genomic resources and the statistical models are now available to plant breeders to predict both the performance of breeding lines per se as well as the value of progeny from any potential crosses.
The effectiveness of genomic selection in breeding programs depends on the phenotypic quality and depth, the prediction model, the number and type of molecular markers, and the size and composition of the training population (TR). Furthermore, population structure and diversity have a key role in the composition of the optimal training sets. Our goal was to compare strategies for optimizing the TR for specific testing populations (TE). A total of 1353 wheat (Triticum aestivum L.) and 644 rice (Oryza sativa L.) advanced lines were evaluated for grain yield in multiple environments. Several within-TR optimization strategies were compared to identify groups of individuals with increased predictive ability. Additionally, optimization strategies to choose individuals from the TR with higher predictive ability for a specific TE were compared. There is a benefit in considering both the population structure and the relationship between the TR and the TE when designing an optimal TR for genomic selection. A weighted relationship matrix with stratified sampling is the best strategy for forward predictions of quantitative traits in populations several generations apart. G enomic selection (GS) consists of selecting individuals from a TE on the basis of genotypic values predicted from their genome-wide molecular marker scores and a statistical model adjusted with individuals that have phenotypic and genotypic information (Meuwissen et al., 2001). The group of individuals that were phenotyped and genotyped is called the TR (Heffner et al. 2009). Genomic selection is preferred over marker-assisted selection approaches for complex traits (Habier et al., 2007; Lorenz et al., 2011) because it includes all molecular markers in the prediction model and because it considers the quantitative trait loci of both major and minor effects (Xu, 2003; Jannink et al., 2010; Poland and Rife, 2012; Smith et al., 2018). Simulated and empirical cross-validation studies in plants show that GS can accelerate progress in plant breeding compared with marker-assisted selection, resulting in higher genetic gains (
Association mapping has been proposed to identify polymorphisms involved in phenotypic variations and may prove useful in identifying interesting alleles for breeding purposes. Using this approach, a total of 382 cultivars and advanced lines of spring wheat obtained from three breeding programs (Chile, Uruguay and CIMMYT) were evaluated for plant height (PH), kernels per spike (KS), 1,000 kernel weight (TKW), grain yield and carbon isotope discrimination (D 13 C) and tested for genotyping-bysequencing-derived SNP markers across the hexaploid wheat genome. A Bayesian clustering approach via Markov chain Monte Carlo was performed to examine the genetic differentiation (F ST ) among different genetic groups. The results indicated the existence of two distinct and strongly differentiated genetic groups. Cluster I contained 215 genotypes (56.3 %), over 60 % (137/215) of which were collected from CI-MMYT. Cluster II showed the highest F ST value, according to 95 % credible interval. Linkage disequilibrium (LD) among SNPs was calculated for the A, B and D genomes and at the whole-genome level. LD decayed over a longer genetic distance for the D genome than for the A and B genomes. In the A and B genomes, LD declined to 50 % of its initial value at about 2 cM. In the D genome, LD was much more extensive, declining to 50 % of its initial value only at 22 cM. In the whole genome, LD declined to 50 % of its initial value at an average of 4 cM. Important genomic regions associated with complex traits in Electronic supplementary material The online version of this article (spring wheat were identified. Selection on these regions may increase the efficiency of the current breeding programs. Although most of the associations were environment specific, some stable associations were detected for D 13 C, KS, PH and TKW. Chromosomes 1A, 3A, 4A and 5A were the most important chromosomes, as they comprised quantitative trait loci (QTL) for D 13 C, a trait that can be used as an indirect tool for increased water-use efficiency in wheat. Environment-specific genomic regions were detected, indicating the presence of QTL-by-environment interaction. To produce suitable genotypes under contrasting water availability conditions, QTL 9 E interactions (and genotype-by-environment interaction) should be considered in the current spring wheat breeding program.
BackgroundWhole-genome genotyping techniques like Genotyping-by-sequencing (GBS) are being used for genetic studies such as Genome-Wide Association (GWAS) and Genomewide Selection (GS), where different strategies for imputation have been developed. Nevertheless, imputation error may lead to poor performance (i.e. smaller power or higher false positive rate) when complete data is not required as it is for GWAS, and each marker is taken at a time. The aim of this study was to compare the performance of GWAS analysis for Quantitative Trait Loci (QTL) of major and minor effect using different imputation methods when no reference panel is available in a wheat GBS panel.ResultsIn this study, we compared the power and false positive rate of dissecting quantitative traits for imputed and not-imputed marker score matrices in: (1) a complete molecular marker barley panel array, and (2) a GBS wheat panel with missing data. We found that there is an ascertainment bias in imputation method comparisons. Simulating over a complete matrix and creating missing data at random proved that imputation methods have a poorer performance. Furthermore, we found that when QTL were simulated with imputed data, the imputation methods performed better than the not-imputed ones. On the other hand, when QTL were simulated with not-imputed data, the not-imputed method and one of the imputation methods performed better for dissecting quantitative traits. Moreover, larger differences between imputation methods were detected for QTL of major effect than QTL of minor effect. We also compared the different marker score matrices for GWAS analysis in a real wheat phenotype dataset, and we found minimal differences indicating that imputation did not improve the GWAS performance when a reference panel was not available.ConclusionsPoorer performance was found in GWAS analysis when an imputed marker score matrix was used, no reference panel is available, in a wheat GBS panel.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-3120-5) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.