2022
DOI: 10.3389/fgene.2022.958780
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating dimensionality reduction for genomic prediction

Abstract: The development of genomic selection (GS) methods has allowed plant breeding programs to select favorable lines using genomic data before performing field trials. Improvements in genotyping technology have yielded high-dimensional genomic marker data which can be difficult to incorporate into statistical models. In this paper, we investigated the utility of applying dimensionality reduction (DR) methods as a pre-processing step for GS methods. We compared five DR methods and studied the trend in the prediction… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 54 publications
0
3
0
Order By: Relevance
“…Dimensionality reduction finds applications in various domains such as feature selection, computational efficiency, efficient data storage, and noise reduction in data visualization. Moreover, it is instrumental in addressing statistical modeling and inference challenges that arise in cases characterized by the "large p, small n" problem, where the sample dimensions p significantly exceeds the sample size n. For example, dimensionality reduction becomes necessary in cases such as microarray data analysis 3 and genomic selection 4 , where there are a large number of genetic variants (typically in the tens of thousands), but the number of samples available for each gene is relatively limited. As a crucial aspect of data preprocessing, dimensionality reduction methods are of great importance in the fields of machine learning and data science 5,6 .…”
Section: Introductionmentioning
confidence: 99%
“…Dimensionality reduction finds applications in various domains such as feature selection, computational efficiency, efficient data storage, and noise reduction in data visualization. Moreover, it is instrumental in addressing statistical modeling and inference challenges that arise in cases characterized by the "large p, small n" problem, where the sample dimensions p significantly exceeds the sample size n. For example, dimensionality reduction becomes necessary in cases such as microarray data analysis 3 and genomic selection 4 , where there are a large number of genetic variants (typically in the tens of thousands), but the number of samples available for each gene is relatively limited. As a crucial aspect of data preprocessing, dimensionality reduction methods are of great importance in the fields of machine learning and data science 5,6 .…”
Section: Introductionmentioning
confidence: 99%
“…Despite significant progress in high-throughput phenotyping (HTP) during the last decade, phenotyping plants in comparable numbers to those of the SNPs is still not practical. This results in a situation often termed "the curse of dimensionality" (Manthena, Jarquín et al 2022), where the number of observations n is far smaller than the number of variables p (n << p). As a consequence, the models become very complex with millions of estimable parameters, are prone to overfit on the limited training data and do not generalise well on the test data.…”
Section: High Dimensionalitymentioning
confidence: 99%
“…An approach to remove highly correlated SNPs uses binning of contiguous regions on the genome and selects only one SNP out of a bin (Du, Wei et al 2018), but an optimal selection of bin size is not straightforward. Dimensionality reduction can also be achieved by transformation of the genotype matrix into a low-dimensional representation (Manthena, Jarquín et al 2022). For instance, singular value decomposition of the genotype matrix can be employed to perform GP using a smaller 1 number of principal components as features instead of using all SNPs (Pintus, Gaspa et al 2012, Du, Wei et al 2018, Odegard, Indahl et al 2018.…”
Section: Improvement Strategies 141 Mitigating High-dimensionalitymentioning
confidence: 99%
See 1 more Smart Citation