The predictive utility of polygenic scores is increasing, and many polygenic scoring methods are available, but it is unclear which method performs best. This study evaluates the predictive utility of polygenic scoring methods within a reference-standardized framework, which uses a common set of variants and reference-based estimates of linkage disequilibrium and allele frequencies to construct scores. Eight polygenic score methods were tested: p-value thresholding and clumping (pT+clump), SBLUP, lassosum, LDpred1, LDpred2, PRScs, DBSLMM and SBayesR, evaluating their performance to predict outcomes in UK Biobank and the Twins Early Development Study (TEDS). Strategies to identify optimal p-value thresholds and shrinkage parameters were compared, including 10-fold cross validation, pseudovalidation and infinitesimal models (with no validation sample), and multi-polygenic score elastic net models. LDpred2, lassosum and PRScs performed strongly using 10-fold cross-validation to identify the most predictive p-value threshold or shrinkage parameter, giving a relative improvement of 16–18% over pT+clump in the correlation between observed and predicted outcome values. Using pseudovalidation, the best methods were PRScs, DBSLMM and SBayesR. PRScs pseudovalidation was only 3% worse than the best polygenic score identified by 10-fold cross validation. Elastic net models containing polygenic scores based on a range of parameters consistently improved prediction over any single polygenic score. Within a reference-standardized framework, the best polygenic prediction was achieved using LDpred2, lassosum and PRScs, modeling multiple polygenic scores derived using multiple parameters. This study will help researchers performing polygenic score studies to select the most powerful and predictive analysis methods.
BackgroundThe predictive utility of polygenic scores is increasing, and many polygenic scoring methods are available, but it is unclear which method performs best. This study evaluates the predictive utility of polygenic scoring methods within a reference-standardized framework, which uses a common set of variants and reference-based estimates of linkage disequilibrium and allele frequencies to construct scores.MethodsSix polygenic score methods were tested: p-value thresholding and clumping (pT+clump), SBLUP, lassosum, LDPred, PRScs and SBayesR, evaluating their performance to predict outcomes in UK Biobank and the Twins Early Development Study (TEDS). Strategies to identify optimal p-value threshold and shrinkage parameters were compared, including 10-fold cross validation, pseudovalidation (with no validation sample), and multi-polygenic score elastic net models.Resultslassosum, PRScs and LDPred performed strongly using 10-fold cross-validation to identify the most predictive p-value threshold or shrinkage parameter, giving a relative improvement of 14-17% over pT+clump in the correlation between observed and predicted outcome values. Using pseudovalidation, the best method was PRScs, with a relative improvement of >11% over other pseudovalidation methods (lassosum, SBLUP, SBayesR, LDPred), and only 1% less than the best polygenic score identified by 10-fold cross validation. Elastic net models containing polygenic scores based on a range of parameters consistently improved prediction over any single polygenic score.ConclusionWithin a reference-standardized framework, the best polygenic prediction was achieved using lassosum and PRScs, modeling multiple polygenic scores derived using multiple parameters. This study will help researchers performing polygenic score studies to select the most powerful and predictive analysis methods.
Background Integration of functional genomic annotations when estimating polygenic risk scores (PRS) can provide insight into aetiology and improve risk prediction. This study explores the predictive utility of gene expression risk scores (GeRS), calculated using imputed gene expression and transcriptome-wide association study (TWAS) results. Methods The predictive utility of GeRS was evaluated using 12 neuropsychiatric and anthropometric outcomes measured in two target samples: UK Biobank and the Twins Early Development Study (TEDS). GeRS were calculated based on imputed gene expression levels and TWAS results, using 53 gene expression-genotype panels, termed SNP-weight sets, capturing expression across a range of tissues. We compare the predictive utility of elastic net models containing GeRS within and across SNP-weight sets, and models containing both GeRS and PRS. We estimate the proportion of SNP-based heritability attributable to cis-regulated gene expression. Results GeRS significantly predicted a range of outcomes, with elastic net models combining GeRS across SNP-weight sets improving prediction. GeRS were less predictive than PRS, but models combining GeRS and PRS improved prediction for several outcomes, with relative improvements ranging from 0.3% for Height (p = 0.023) to 4% for Rheumatoid Arthritis (p = 5.9 × 10−8). The proportion of SNP-based heritability attributable to cis-regulated expression was modest for most outcomes, even when restricting GeRS to colocalised genes. Conclusion GeRS represent a component of PRS and could be useful for functional stratification of genetic risk. Only in specific circumstances can GeRS substantially improve prediction over PRS alone. Future research considering functional genomic annotations when estimating genetic risk is warranted.
Background People with bipolar disorder (BPD) are more likely to die prematurely, which is partly attributed to comorbid cardiometabolic traits. Previous studies report cardiometabolic abnormalities in BPD, but their shared aetiology remains poorly understood. This study examined the phenotypic associations and shared genetic aetiology between BPD and various cardiometabolic traits. Methods In a subset of the UK Biobank sample (N = 61 508) we investigated phenotypic associations between BPD (ncases = 4186) and cardiometabolic traits, represented by biomarkers, anthropometric traits and cardiometabolic diseases. To determine shared genetic aetiology in European ancestry, polygenic risk scores (PRS) and genetic correlations were calculated between BPD and cardiometabolic traits. Results Several traits were significantly associated with increased risk for BPD, namely low total cholesterol, low high-density lipoprotein cholesterol, high triglycerides, high glycated haemoglobin, low systolic blood pressure, high body mass index, high waist-to-hip ratio; and stroke, coronary artery disease and type 2 diabetes diagnosis. BPD was associated with higher polygenic risk for triglycerides, waist-to-hip ratio, coronary artery disease and type 2 diabetes. Shared genetic aetiology persisted for coronary artery disease, when correcting PRS associations for cardiometabolic base phenotypes. Associations were not replicated using genetic correlations. Conclusions This large study identified increased phenotypic cardiometabolic abnormalities in BPD participants. It is found that the comorbidity of coronary artery disease may be based on shared genetic aetiology. These results motivate hypothesis-driven research to consider individual cardiometabolic traits rather than a composite metabolic syndrome when attempting to disentangle driving mechanisms of cardiometabolic abnormalities in BPD.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.