An Ensemble Penalized Regression Method for Multi-ancestry Polygenic Risk Prediction

Zhang, Jingning; Zhan, Junpeng; Jin, Jin; Ma, Cheng; Zhao, Ruzhang; Connell, Jared O’; Jiang, Yunxuan; Koelsch, Bertram L.; Zhang, Haoyu; Chatterjee, Nilanjan

doi:10.1101/2023.03.15.532652

Cited by 7 publications

(21 citation statements)

References 66 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While a large portion of GWAS data currently comes from individuals of European ancestry, it is well-known that polygenic risk scores do not transfer well between individuals of different ancestries, which can impact their utility for patients of non-European ancestry [64][65][66][67][68][69] . Many methods that utilize summary data from multiple populations have already been proposed and demonstrate improved prediction in under-represented populations 25,29,30,[70][71][72][73][74] . As genetics research moves towards greater diversity, ALL-Sum serves as a valuable foundation for extension to incorporating data from multiple ancestries, as well as admixed individuals [75][76][77] .…”

Section: Discussionmentioning

confidence: 99%

“…In high-dimensional problems such as PRS which involve a large number of SNPs, the best tuning parameters may be in the gaps or outside the bounds of the fixed grid being considered 23 . Recent works, in both theory and PRS applications, have demonstrated that ensembling multiple predictors can yield better prediction accuracy than choosing a single best predictor from grid search [24][25][26][27][28][29][30] .…”

Section: Mainmentioning

confidence: 99%

See 1 more Smart Citation

Ensembled best subset selection using summary statistics for polygenic risk prediction

Chen,

Zhang,

Mazumder

et al. 2023

Preprint

Self Cite

View full text Add to dashboard Cite

Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, yet existing methods face a tradeoff between predictive power and computational efficiency. We introduce ALL-Sum, a fast and scalable PRS method that combines an efficient summary statistic-based L0L2penalized regression algorithm with an ensembling step that aggregates estimates from different tuning parameters for improved prediction performance. In extensive large-scale simulations across a wide range of polygenicity and genome-wide association studies (GWAS) sample sizes, ALL-Sum consistently outperforms popular alternative methods in terms of prediction accuracy, runtime, and memory usage. We analyze 27 published GWAS summary statistics for 11 complex traits from 9 reputable data sources, including the Global Lipids Genetics Consortium, Breast Cancer Association Consortium, and FinnGen, evaluated using individual-level UKBB data. ALL-Sum achieves the highest accuracy for most traits, particularly for GWAS with large sample sizes. We provide ALL-Sum as a user-friendly command-line software with pre-computed reference data for streamlined user-end analysis.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Mainmentioning

confidence: 99%

Ensembled best subset selection using summary statistics for polygenic risk prediction

Chen,

Zhang,

Mazumder

et al. 2023

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Similarly, SDPRX linear 2 and XPASS linear 2 linearly combined the standardized PRS of two populations to obtain the final PRS. Conversely, MUSSEL super max and PROSPER super max consider a super learning step that substitutes the linear combination by the non-linear machine learning techniques, and use the ensemble learning instead of selecting the optimally-tuned parameters, following their original work [18,19].…”

Section: Existing Methodsmentioning

confidence: 99%

“…To address this need, there has been an increase in the number of GWAS focused on non-European populations [8][9][10][11][12][13][14][15], complemented by the developments of various models tailored for multi-population PRS predictions. These models employ different strategies to leverage multiple GWAS results, including utilizing multiple populations, assuming sparse distributions for genetic risk variants across populations, and accounting for genetic correlations among populations [16][17][18][19][20][21][22]. However, to our knowledge, there is no method that integrates all these components under a coherent framework in the absence of individual-level validation data -a common situation in real data applications.…”

Section: Introductionmentioning

confidence: 99%

“…We illustrate the benefits of integrating genetic correlation structures and jointly using multiple populations over exclusive reliance on the target and the auxiliary European groups. In addition, we compare JointPRS with other methods, including PRS-CSx [17], MUSSEL [18], PROSPER [19], SDPRX [20], XPASS [21], and BridgePRS [22], with and without a validation dataset. Our results illustrate the distinct contributions of joint modelling and genetic correlations to prediction accuracy, and suggest JointPRS as a promising method for predicting complex traits across populations with the auto-tuned JointPRS achieving comparable performances as top-performing alternative approaches that require heavy tuning on individual-level validation sets.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

JointPRS: A Data-Adaptive Framework for Multi-Population Genetic Risk Prediction Incorporating Genetic Correlation

Xu,

Zhou,

Jiang

et al. 2023

Preprint

View full text Add to dashboard Cite

The disparity in genetic risk prediction accuracy between European and non-European individuals highlights a critical challenge in health inequality. To bridge this gap, we introduce JointPRS, a novel method that models multiple populations jointly to improve genetic risk predictions for non-European individuals. JointPRS has three key features. First, it encompasses all diverse populations to improve prediction accuracy, rather than relying solely on the target population with a singular auxiliary European group. Second, it autonomously estimates and leverages chromosome-wise cross-population genetic correlations to infer the effect sizes of genetic variants. Lastly, it provides an auto version that has comparable performance to the tuning version to accommodate the situation with no validation dataset. Through extensive simulations and real data applications to 22 quantitative traits and four binary traits in East Asian, nine quantitative traits and one binary trait in African, and four quantitative traits in South Asian populations, we demonstrate that JointPRS outperforms state-of-art methods, improving the prediction accuracy for both quantitative and binary traits in non-European populations.

show abstract

Tuning parameters for polygenic risk score methods using GWAS summary statistics from training data

Jiang,

Chen,

Girgenti

et al. 2024

Nat Commun

View full text Add to dashboard Cite

Various polygenic risk scores (PRS) methods have been proposed to combine the estimated effects of single nucleotide polymorphisms (SNPs) to predict genetic risks for common diseases, using data collected from genome-wide association studies (GWAS). Some methods require external individual-level GWAS dataset for parameter tuning, posing privacy and security-related concerns. Leaving out partial data for parameter tuning can also reduce model prediction accuracy. In this article, we propose PRStuning, a method that tunes parameters for different PRS methods using GWAS summary statistics from the training data. PRStuning predicts the PRS performance with different parameters, and then selects the best-performing parameters. Because directly using training data effects tends to overestimate the performance in the testing data, we adopt an empirical Bayes approach to shrinking the predicted performance in accordance with the genetic architecture of the disease. Extensive simulations and real data applications demonstrate PRStuning’s accuracy across PRS methods and parameters.

show abstract

An Ensemble Penalized Regression Method for Multi-ancestry Polygenic Risk Prediction

Cited by 7 publications

References 66 publications

Ensembled best subset selection using summary statistics for polygenic risk prediction

Ensembled best subset selection using summary statistics for polygenic risk prediction

JointPRS: A Data-Adaptive Framework for Multi-Population Genetic Risk Prediction Incorporating Genetic Correlation

Tuning parameters for polygenic risk score methods using GWAS summary statistics from training data

Contact Info

Product

Resources

About