Leveraging effect size distributions to improve polygenic risk scores derived from summary statistics of genome-wide association studies

Song, Shuang; Jiang, Wei; Hou, Lin; Zhao, Hongyu

doi:10.1371/journal.pcbi.1007565

Cited by 40 publications

(51 citation statements)

References 35 publications

(44 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, genes in these regions are less likely to be captured in the post-GWAS analysis. For example, we identified significant local genetic correlation in a region (chr4:145,024,452-148,047,972; Supplementary file ) for hip circumference [28, 29]. However, the most significant p -value for IPF of the SNPs in that region is 7·8E-4 (rs2055059).…”

Section: Resultsmentioning

confidence: 99%

“…Besides phenotype-level correlation, we also investigated the relationship between a polygenic risk score (PRS) of IPF and the 670 phenotypes in UKBB. We used the R package EB-PRS [28] to obtain PRS for individuals from the UKBB based on IPF summary statistics [4]. Following standard quality control criteria, we restricted the analysis to autosomal variants with genotype missing rate per marker < 0•05, imputation information score above 0•3, Hardy-Weinberg Equilibrium p-value > 1e-4, and minor allele frequency (MAF) < 0•01.…”

Section: Polygenic Risk Score Correlationmentioning

confidence: 99%

See 1 more Smart Citation

Integrative Analyses Reveal Novel Disease-associated Loci and Genes for Idiopathic Pulmonary Fibrosis

Chen

Zhang

Adams

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Background: Although genome-wide association studies have identified many genomic regions associated with idiopathic pulmonary fibrosis (IPF), the causal genes and functions remain largely unknown. Many bulk and single-cell expression data have become available for IPF, and there is increasing evidence suggesting a shared genetic basis between IPF and other diseases. Methods: By leveraging shared genetic information and transcriptome data, we conducted an integrative analysis to identify novel genes for IPF. We first considered observed phenotypes, polygenic risk scores, and genetic correlations to investigate associations between IPF and other traits in the UK Biobank. We then performed local genetic correlation analysis and cross-tissue transcriptome-wide association analysis (TWAS) to identify IPF genes. We further prioritized genes using bulk and single-cell gene expression data. Findings: We identified 25 traits correlated with IPF on the phenotype level and seven traits genetically correlated with IPF. Using local genetic correlation, we identified 12 candidate genes across 14 genomic regions, including the POT1 locus (p-value = 4.1E-4), which contained variants with protective effects on lung cancer but increasing IPF risk. Using TWAS, we identified 36 genes, including 12 novel genes for IPF. Annotation-stratified heritability estimation and differential expression analysis of downstream-regulated genes suggested regulatory roles of two candidate genes, MAFK and SMAD2, on IPF. Interpretation: Our integrative analysis identified new genes for IPF susceptibility and expanded the understanding of the complex genetic architecture of IPF. Funding: NIHR Leicester Biomedical Research Centre, Three Lakes Partners, the National Institutes of Health, the National Science Foundation, U01HL145567, and UH2HL123886.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Polygenic Risk Score Correlationmentioning

confidence: 99%

Integrative Analyses Reveal Novel Disease-associated Loci and Genes for Idiopathic Pulmonary Fibrosis

Chen

Zhang

Adams

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…In addition to P + T approach, several Bayesian approaches for PRS calculation have been continuously developed. We therefore used P + T method and Bayesian approaches; PRScs [ 28 ] and EB-PRS [ 29 ] with and without reference LD information, respectively. Using these multiple approach for PRS calculation, we assessed the discrimination of PRS between PD cases and controls using area under curve (AUC) metrics.…”

Section: Resultsmentioning

confidence: 99%

“…For P + T approach, we conducted LD clump using PLINK [ 39 ] with a LD parameter of 0.5 and P value thresholds were set ranging from 5.00E-02 to 1.00E-20. Bayesian approaches including PRScs [ 28 ] and EB-PRS [ 29 ] were conducted with default parameters. For PRScs, we used reference LD information of 1KGP3 for East Asian populations.…”

Section: Methodsmentioning

confidence: 99%

Evaluation of low-pass genome sequencing in polygenic risk score calculation for Parkinson’s disease

Kim

Shin²,

Kwon³

et al. 2021

Hum Genomics

View full text Add to dashboard Cite

Background Low-pass sequencing (LPS) has been extensively investigated for applicability to various genetic studies due to its advantages over genotype array data including cost-effectiveness. Predicting the risk of complex diseases such as Parkinson’s disease (PD) using polygenic risk score (PRS) based on the genetic variations has shown decent prediction accuracy. Although ultra-LPS has been shown to be effective in PRS calculation, array data has been favored to the majority of PRS analysis, especially for PD. Results Using eight high-coverage WGS, we assessed imputation approaches for downsampled LPS data ranging from 0.5 × to 7.0 × . We demonstrated that uncertain genotype calls of LPS diminished imputation accuracy, and an imputation approach using genotype likelihoods was plausible for LPS. Additionally, comparing imputation accuracies between LPS and simulated array illustrated that LPS had higher accuracies particularly at rare frequencies. To evaluate ultra-low coverage data in PRS calculation for PD, we prepared low-coverage WGS and genotype array of 87 PD cases and 101 controls. Genotype imputation of array and downsampled LPS were conducted using a population-specific reference panel, and we calculated risk scores based on the PD-associated SNPs from an East Asian meta-GWAS. The PRS models discriminated cases and controls as previously reported when both LPS and genotype array were used. Also strong correlations in PRS models for PD between LPS and genotype array were discovered. Conclusions Overall, this study highlights the potentials of LPS under 1.0 × followed by genotype imputation in PRS calculation and suggests LPS as attractive alternatives to genotype array in the area of precision medicine for PD.

show abstract

“…Recently published methods in this area include PRS-CS [13] and SBayesR [14]. Other methods, such as EBPRS [15], leverage the available GWAS data to estimate a distribution of SNP effect sizes that is leveraged to adjust the marginal SNP effects. These methods do not necessitate individual level data.…”

Section: Introductionmentioning

confidence: 99%

Penalized regression and model selection methods for polygenic scores on summary statistics

Pattee

Pan

2020

PLoS Comput Biol

View full text Add to dashboard Cite

Polygenic scores quantify the genetic risk associated with a given phenotype and are widely used to predict the risk of complex diseases. There has been recent interest in developing methods to construct polygenic risk scores using summary statistic data. We propose a method to construct polygenic risk scores via penalized regression using summary statistic data and publicly available reference data. Our method bears similarity to existing method LassoSum, extending their framework to the Truncated Lasso Penalty (TLP) and the elastic net. We show via simulation and real data application that the TLP improves predictive accuracy as compared to the LASSO while imposing additional sparsity where appropriate. To facilitate model selection in the absence of validation data, we propose methods for estimating model fitting criteria AIC and BIC. These methods approximate the AIC and BIC in the case where we have a polygenic risk score estimated on summary statistic data and no validation data. Additionally, we propose the so-called quasi-correlation metric, which quantifies the predictive accuracy of a polygenic risk score applied to out-of-sample data for which we have only summary statistic information. In total, these methods facilitate estimation and model selection of polygenic risk scores on summary statistic data, and the application of these polygenic risk scores to out-of-sample data for which we have only summary statistic information. We demonstrate the utility of these methods by applying them to GWA studies of lipids, height, and lung cancer.

show abstract

Leveraging effect size distributions to improve polygenic risk scores derived from summary statistics of genome-wide association studies

Cited by 40 publications

References 35 publications

Integrative Analyses Reveal Novel Disease-associated Loci and Genes for Idiopathic Pulmonary Fibrosis

Integrative Analyses Reveal Novel Disease-associated Loci and Genes for Idiopathic Pulmonary Fibrosis

Evaluation of low-pass genome sequencing in polygenic risk score calculation for Parkinson’s disease

Penalized regression and model selection methods for polygenic scores on summary statistics

Contact Info

Product

Resources

About