2019
DOI: 10.1007/s10994-019-05848-5
|View full text |Cite
|
Sign up to set email alerts
|

An evaluation of machine-learning for predicting phenotype: studies in yeast, rice, and wheat

Abstract: In phenotype prediction the physical characteristics of an organism are predicted from knowledge of its genotype and environment. Such studies, often called genome-wide association studies, are of the highest societal importance, as they are of central importance to medicine, crop-breeding, etc. We investigated three phenotype prediction problems: one simple and clean (yeast), and the other two complex and real-world (rice and wheat). We compared standard machine learning methods; elastic net, ridge regression… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

1
53
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1
1

Relationship

3
5

Authors

Journals

citations
Cited by 104 publications
(61 citation statements)
references
References 86 publications
1
53
0
Order By: Relevance
“…GS is fundamentally different from GWAS, as it involves use a full-genome information, regardless of its significance, in relation to a specific trait, rather than a few markers as in GWAS. This genotypic information, collected from training and validation population, is used in conjunction with corresponding phenotypic data, collected from training population, to develop a predictive model [12,14]. In forest tree breeding programs, GWAS and GS could substantially reduce the length of breeding cycles and increase genetic gain per unit time through early selection of superior genotypes during the juvenile phase.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…GS is fundamentally different from GWAS, as it involves use a full-genome information, regardless of its significance, in relation to a specific trait, rather than a few markers as in GWAS. This genotypic information, collected from training and validation population, is used in conjunction with corresponding phenotypic data, collected from training population, to develop a predictive model [12,14]. In forest tree breeding programs, GWAS and GS could substantially reduce the length of breeding cycles and increase genetic gain per unit time through early selection of superior genotypes during the juvenile phase.…”
Section: Introductionmentioning
confidence: 99%
“…Since these statistical methods cannot explicitly account for interactions among single nucleotide polymorphisms (SNPs), application of Machine Learning in GS studies has been proposed. Machine Learning is being increasingly applied in GS studies because it does not require any assumptions about the underlying traits, it is easy to use, and it can both capture complex non-linear relationships and efficiently increase prediction accuracy [14]. Popular Machine Learning methods include Random Forest (RF), Extreme Gradient Boosting (XgBoost) and Bayesian Additive Regression Tree (BART) modelling.…”
Section: Introductionmentioning
confidence: 99%
“…Previously proposed approaches for learning the imputation rules are based on regularized linear models [11][12][13][14], polygenic risk scores [11] and using the top SNP to predict expression levels [12]. However, the machine learning literature has shown that alternative approaches such as random forests (RF), which allow naturally for non-linear and non-additive effects, can produce more accurate predictions in model organisms [15,16]. We set out to explore whether using RF could also lead to better gene expression predictions in humans and, if so, whether that could be translated into a more powerful TWAS.…”
Section: Introductionmentioning
confidence: 99%
“…We also sought to take advantage of the fact that expression levels of a given gene in different cell types can be correlated by considering expression values across multiple cell types simultaneously in a multi-task framework. This has been shown to improve multi-trait predictions in yeast [16] and in applications to real and simulated data in marker-assisted selection for several related traits [17][18][19] or populations [20]. Multi-trait approaches have also been used to analyse eQTL datasets [21,22].…”
Section: Introductionmentioning
confidence: 99%
“…Machine learning algorithms are increasingly being adapted for the prediction of plant phenotypes (Grinberg et al 2016(Grinberg et al , 2019. This task is most commonly regression based as most agronomic phenotypes are quantitative.…”
Section: Introductionmentioning
confidence: 99%