2023
DOI: 10.1101/2023.03.15.532652
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

An Ensemble Penalized Regression Method for Multi-ancestry Polygenic Risk Prediction

Abstract: Great efforts are being made to develop advanced polygenic risk scores (PRS) to improve the prediction of complex traits and diseases. However, most existing PRS are primarily trained on European ancestry populations, limiting their transferability to non-European populations. In this article, we propose a novel method for generating multi-ancestry Polygenic Risk scOres based on enSemble of PEnalized Regression models (PROSPER). PROSPER integrates genome-wide association studies (GWAS) summary statistics from … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 7 publications
(21 citation statements)
references
References 66 publications
0
20
0
Order By: Relevance
“…While a large portion of GWAS data currently comes from individuals of European ancestry, it is well-known that polygenic risk scores do not transfer well between individuals of different ancestries, which can impact their utility for patients of non-European ancestry [64][65][66][67][68][69] . Many methods that utilize summary data from multiple populations have already been proposed and demonstrate improved prediction in under-represented populations 25,29,30,[70][71][72][73][74] . As genetics research moves towards greater diversity, ALL-Sum serves as a valuable foundation for extension to incorporating data from multiple ancestries, as well as admixed individuals [75][76][77] .…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…While a large portion of GWAS data currently comes from individuals of European ancestry, it is well-known that polygenic risk scores do not transfer well between individuals of different ancestries, which can impact their utility for patients of non-European ancestry [64][65][66][67][68][69] . Many methods that utilize summary data from multiple populations have already been proposed and demonstrate improved prediction in under-represented populations 25,29,30,[70][71][72][73][74] . As genetics research moves towards greater diversity, ALL-Sum serves as a valuable foundation for extension to incorporating data from multiple ancestries, as well as admixed individuals [75][76][77] .…”
Section: Discussionmentioning
confidence: 99%
“…In high-dimensional problems such as PRS which involve a large number of SNPs, the best tuning parameters may be in the gaps or outside the bounds of the fixed grid being considered 23 . Recent works, in both theory and PRS applications, have demonstrated that ensembling multiple predictors can yield better prediction accuracy than choosing a single best predictor from grid search [24][25][26][27][28][29][30] .…”
Section: Mainmentioning
confidence: 99%
“…Similarly, SDPRX linear 2 and XPASS linear 2 linearly combined the standardized PRS of two populations to obtain the final PRS. Conversely, MUSSEL super max and PROSPER super max consider a super learning step that substitutes the linear combination by the non-linear machine learning techniques, and use the ensemble learning instead of selecting the optimally-tuned parameters, following their original work [18,19].…”
Section: Existing Methodsmentioning
confidence: 99%
“…To address this need, there has been an increase in the number of GWAS focused on non-European populations [8][9][10][11][12][13][14][15], complemented by the developments of various models tailored for multi-population PRS predictions. These models employ different strategies to leverage multiple GWAS results, including utilizing multiple populations, assuming sparse distributions for genetic risk variants across populations, and accounting for genetic correlations among populations [16][17][18][19][20][21][22]. However, to our knowledge, there is no method that integrates all these components under a coherent framework in the absence of individual-level validation data -a common situation in real data applications.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation