The application of polygenic risk scores (PRS) has become routine across genetic research. Among a range of applications, PRS are exploited to assess shared aetiology between phenotypes, to evaluate the predictive power of genetic data for use in clinical settings, and as part of experimental studies in which, for example, experiments are performed on individuals, or their biological samples (eg. tissues, cells), at the tails of the PRS distribution and contrasted. As GWAS sample sizes increase and PRS become more powerful, they are set to play a key role in personalised medicine. However, despite the growing application and importance of PRS, there are limited guidelines for performing PRS analyses, which can lead to inconsistency between studies and misinterpretation of results. Here we provide detailed guidelines for performing polygenic risk score analyses relevant to different methods for their calculation, outlining standard quality control steps and offering recommendations for bestpractice. We also discuss different methods for the calculation of PRS, common misconceptions regarding the interpretation of results and future challenges.Genome-wide association studies (GWAS) have identified a large number of genetic variants, typically single nucleotide polymorphisms (SNP), associated with a wide range of complex traits [1-3]. However, the majority of these variants have a small effect and typically correspond to a small fraction of truly associated variants, meaning that they have limited predictive power [4][5][6]. Using a linear mixed model in the Genome-wide Complex Trait Analysis software (GCTA) [7], Yang et al (2010) demonstrated that much of the heritability of height can be explained by evaluating the effects of all SNPs simultaneously [6]. Subsequently, statistical techniques such as LD score regression (LDSC) [8,9] and the polygenic risk score (PRS) method [4,10] have also aggregated the effects of variants across the genome to estimate heritability, to infer genetic overlap between traits and to predict phenotypes based on genetic profile or that of other phenotypes [4,5,[8][9][10].While GCTA, LDSC and PRS can all be exploited to infer heritability and shared aetiology among complex traits, PRS is the only approach that provides an estimate of genetic propensity to a trait at the individual-level. In the standard approach [4,[11][12][13], polygenic risk scores are calculated by computing the sum of risk alleles corresponding to a phenotype of .
Background Polygenic risk score (PRS) analyses have become an integral part of biomedical research, exploited to gain insights into shared aetiology among traits, to control for genomic profile in experimental studies, and to strengthen causal inference, among a range of applications. Substantial efforts are now devoted to biobank projects to collect large genetic and phenotypic data, providing unprecedented opportunity for genetic discovery and applications. To process the large-scale data provided by such biobank resources, highly efficient and scalable methods and software are required. Results Here we introduce PRSice-2, an efficient and scalable software program for automating and simplifying PRS analyses on large-scale data. PRSice-2 handles both genotyped and imputed data, provides empirical association P-values free from inflation due to overfitting, supports different inheritance models, and can evaluate multiple continuous and binary target traits simultaneously. We demonstrate that PRSice-2 is dramatically faster and more memory-efficient than PRSice-1 and alternative PRS software, LDpred and lassosum, while having comparable predictive power. Conclusion PRSice-2's combination of efficiency and power will be increasingly important as data sizes grow and as the applications of PRS become more sophisticated, e.g., when incorporated into high-dimensional or gene set–based analyses. PRSice-2 is written in C++, with an R script for plotting, and is freely available for download from http://PRSice.info.
6The application of polygenic risk scores (PRS) has become routine across genetic 7 research. Among a range of applications, PRS are exploited to assess shared aetiology 8 between phenotypes, to evaluate the predictive power of genetic data for use in clinical 9 settings, and as part of experimental studies in which, for example, experiments are 10 performed on individuals, or their biological samples (eg. tissues, cells), at the tails of 11 the PRS distribution and contrasted. As GWAS sample sizes increase and PRS become 12 more powerful, they are set to play a key role in personalised medicine. However, 13 despite the growing application and importance of PRS, there are limited guidelines for 14 performing PRS analyses, which can lead to inconsistency between studies and 15 misinterpretation of results. Here we provide detailed guidelines for performing 16 polygenic risk score analyses relevant to different methods for their calculation, 17 outlining standard quality control steps and offering recommendations for best-18 practice. We also discuss different methods for the calculation of PRS, common 19 misconceptions regarding the interpretation of results and future challenges. 20 21 Genome-wide association studies (GWAS) have identified a large number of genetic variants, 22 typically single nucleotide polymorphisms (SNP), associated with a wide range of complex 23 traits [1-3]. However, the majority of these variants have a small effect and typically 24 correspond to a small fraction of truly associated variants, meaning that they have limited 25 predictive power [4-6]. Using a linear mixed model in the Genome-wide Complex Trait 26 Analysis software (GCTA) [7], Yang et al (2010) demonstrated that much of the heritability of 27 height can be explained by evaluating the effects of all SNPs simultaneously [6]. Subsequently, 28statistical techniques such as LD score regression (LDSC) [8,9] and the polygenic risk score 29 (PRS) method [4,10] have also aggregated the effects of variants across the genome to 30 estimate heritability, to infer genetic overlap between traits and to predict phenotypes based 31 on genetic profile or that of other phenotypes [4,5,[8][9][10]. 32 33 While GCTA, LDSC and PRS can all be exploited to infer heritability and shared aetiology 34 among complex traits, PRS is the only approach that provides an estimate of genetic 35 propensity to a trait at the individual-level. In the standard approach [4,[11][12][13], polygenic risk 36 scores are calculated by computing the sum of risk alleles corresponding to a phenotype of 37 interest in each individual, weighted by the effect size estimate of the most powerful GWAS 38 on the phenotype. Studies have shown that substantially greater predictive power can usually 39 be achieved by using PRS rather than a small number of genome-wide significant SNPs 40 [11,14,15]. As an individual-level genome-wide genetic proxy of a trait, PRS are suitable for a 41 range of applications. For example, as well as identifying shared aetiology among traits, PRS 42h...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.