A guide for kernel generalized regression methods for genomic-enabled prediction

Montesinos‐López, Abelardo; Montesinos‐López, Osval A.; Montesinos-López, J. Cricelio; Flores-Cortés, Carlos; Rosa, Roberto De; Crossa, J.

doi:10.1038/s41437-021-00412-1

Cited by 19 publications

(19 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another advantage of the Bayesian multitrait kernel methods is that they can significantly reduce the computational resources needed in comparison with Ridge regression multitrait models, since instead of directly using the inputs (independent variables), a transformed input is used that usually has less dimension than the dimension of the number of inputs. However, as with all kernel methods, due to this transformation of the input, the estimates of the beta coefficients are not interpretable as in conventional regression methods, and for this reason, these methods do not help to further understand the complex relationship between input and output, and as such, it is important to avoid false expectations about these methods ( Montesinos-López et al 2021 ) in terms of interpretability. Finally, as one reviewer pointed out, the successful implementation of the multitrait kernel method proposed here is straightforward when the dataset is balanced in the response variable (no missing data) and in the environments, but more complicated when the data are not balanced, but still the method works by only taking care of the imbalance situation.…”

Section: Discussionmentioning

confidence: 99%

“…However, the adoption of the Bayesian paradigm in plant breeding continues to grow due to the great computational advancements and new methodological applications and elucidations. Bayesian MT models offer some of the following advantages mentioned by Montesinos-López et al (2019b ): (1) they allow prior information to be incorporated; (2) they do not need good starting values to estimate parameters of interest such as the restricted maximum likelihood; (3) they increase the precision of parameter estimates (smaller standard errors); (4) conclusions can be drawn about the correlations between the dependent variables, notably, the extent to which the correlations depend on the individual and on the group level; (5) testing whether the effect of an explanatory variable on dependent variable Y1 is larger than its effect on Y2, when Y1 and Y2 data were observed (totally or partially) in the same individuals, is possible only by means of a multivariate analysis; (6) when attempting to carry out a single test of the joint effect of an explanatory variable on several dependent variables, a multivariate analysis is also required; such a single test can be useful, e.g. , to avoid the danger of chance capitalization, which is inherent to carry out a separate test for each dependent variable; and (7) it does not have strong identifiability problems.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Bayesian multitrait kernel methods improve multienvironment genome-based prediction

Montesinos‐López

Montesinos-López

Montesinos‐López³

et al. 2021

G3 Genes|Genomes|Genetics

Self Cite

View full text Add to dashboard Cite

When multi-trait data are available, the preferred models are those that are able to account for correlations between phenotypic traits because when the degree of correlation is moderate or large, this increases the genomic prediction accuracy. For this reason, in this paper we explore Bayesian multi-trait kernel methods for genomic prediction and we illustrate the power of these models with three real datasets. The kernels under study were the linear, Gaussian, polynomial and sigmoid kernels; they were compared with the conventional Ridge regression and GBLUP multi-trait models. The results show that, in general, the Gaussian kernel method outperformed conventional Bayesian Ridge and GBLUP multi-trait linear models by 2.2 to 17.45% (datasets 1 to 3) in terms of prediction performance based on the mean square error of prediction. This improvement in terms of prediction performance of the Bayesian multi-trait kernel method can be attributed to the fact that the proposed model is able to capture non-linear patterns more efficiently than linear multi-trait models. However, not all kernels perform well in the datasets used for evaluation, which is why more than one kernel should be evaluated to be able to choose the best kernel.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Bayesian multitrait kernel methods improve multienvironment genome-based prediction

Montesinos‐López

Montesinos-López

Montesinos‐López³

et al. 2021

G3 Genes|Genomes|Genetics

Self Cite

View full text Add to dashboard Cite

show abstract

“…The prediction accuracy of the approximate kernels depends on the number of subset lines and the decrease in eigenvalue decomposition of the GRM. Further, Montesinos-López et al [87] outlined the implementation of sparse matrices from Cuevas et al [76]. They integrated them with the Bayesian methods from Cuevas et al [85] to create linear, polynomial, sigmoid, Gaussian, and Arccosines with one or more hidden layers and exponential kernels in both a multi-environment and multi-trait framework.…”

Section: Modelmentioning

confidence: 99%

Optimizing Plant Breeding Programs For Genomic Selection

Merrick¹,

Herr²,

Sandhu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Plant geneticists and breeders have used marker technology since the 1980s in quantitative trait locus (QTL) identification. Marker-assisted selection is effective for large-effect QTL but has been challenging to use with quantitative traits controlled by multiple minor effect alleles. Therefore, genomic selection (GS) was proposed to estimate all markers simultaneously, thereby capturing all their effects. However, breeding programs are still struggling to identify the best strategy to implement it into their programs. Traditional breeding programs need to be optimized to implement GS effectively. This review explores the optimization of breeding programs for variety release based on aspects of the breeder’s equation. Optimizations include reorganizing field designs, training populations, increasing the number of lines evaluated, and leveraging the large amount of genomic and phenotypic data collected across different growing seasons and environments to increase heritability estimates, selection intensity, and selection accuracy. Breeding programs can leverage their phenotypic and genotypic data to maximize genetic gain and selection accuracy through GS methods utilizing multi-trait and, multi-environment models, high-throughput phenotyping, and deep learning approaches. Overall, this review describes various methods that plant breeders can utilize to increase genetic gains and effectively implement GS in breeding .

show abstract

Section: Introductionmentioning

confidence: 99%

“…Kernels have proven to be useful in helping the conventional machine learning algorithms capture non-linear patterns in data (Montesinos-López et al, 2021b;Montesinos-López et al, 2022a). In addition to capturing complex non-linear patterns, the sparse kernel version of kernel methods can also save significant computational resources without a relevant loss in prediction accuracy (Montesinos-López et al, 2021b;Montesinos-López, et al, 2022a). In this paper by sparse kernels we define those kernels that are built with only a fraction of the total amount of inputs by assuming that the input matrix is a sparse matrix, that is, a matrix that contain many information with zeros.…”

Section: Introductionmentioning

confidence: 99%

A General-Purpose Machine Learning R Library for Sparse Kernels Methods With an Application for Genome-Based Prediction

et al. 2022

Self Cite

View full text Add to dashboard Cite

The adoption of machine learning frameworks in areas beyond computer science have been facilitated by the development of user-friendly software tools that do not require an advanced understanding of computer programming. In this paper, we present a new package (sparse kernel methods, SKM) software developed in R language for implementing six (generalized boosted machines, generalized linear models, support vector machines, random forest, Bayesian regression models and deep neural networks) of the most popular supervised machine learning algorithms with the optional use of sparse kernels. The SKM focuses on user simplicity, as it does not try to include all the available machine learning algorithms, but rather the most important aspects of these six algorithms in an easy-to-understand format. Another relevant contribution of this package is a function for the computation of seven different kernels. These are Linear, Polynomial, Sigmoid, Gaussian, Exponential, Arc-Cosine 1 and Arc-Cosine L (with L = 2, 3, … ) and their sparse versions, which allow users to create kernel machines without modifying the statistical machine learning algorithm. It is important to point out that the main contribution of our package resides in the functionality for the computation of the sparse version of seven basic kernels, which is indispensable for reducing computational resources to implement kernel machine learning methods without a significant loss in prediction performance. Performance of the SKM is evaluated in a genome-based prediction framework using both a maize and wheat data set. As such, the use of this package is not restricted to genome prediction problems, and can be used in many different applications.

show abstract

A guide for kernel generalized regression methods for genomic-enabled prediction

Cited by 19 publications

References 37 publications

Bayesian multitrait kernel methods improve multienvironment genome-based prediction

Bayesian multitrait kernel methods improve multienvironment genome-based prediction

Optimizing Plant Breeding Programs For Genomic Selection

A General-Purpose Machine Learning R Library for Sparse Kernels Methods With an Application for Genome-Based Prediction

Contact Info

Product

Resources

About