2018
DOI: 10.1101/252270
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Polygenic scores for UK Biobank scale data

Abstract: Polygenic scores (PGS) are estimated scores representing the genetic tendency of an individual for a disease or trait and have become an indispensible tool in a variety of analyses. Typically they are linear combination of the genotypes of a large number of SNPs, with the weights calculated from an external source, such as summary statistics from large meta-analyses. Recently cohorts with genetic data have become very large, rendering external summary statistics superfluous. Making use of raw data in calculati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 12 publications
(10 citation statements)
references
References 42 publications
(52 reference statements)
0
9
0
Order By: Relevance
“…In the absence of an independent data set, the target sample can be subdivided into training and validation data sets, and this process can be repeated with different partitions of the sample, e.g. performing 10-fold cross-validation [56,66,67], to obtain more robust model estimates.…”
Section: Overfitting In Prs Association Testingmentioning
confidence: 99%
“…In the absence of an independent data set, the target sample can be subdivided into training and validation data sets, and this process can be repeated with different partitions of the sample, e.g. performing 10-fold cross-validation [56,66,67], to obtain more robust model estimates.…”
Section: Overfitting In Prs Association Testingmentioning
confidence: 99%
“…It has also been proposed to use external or internal validation to choose tuning parameters and avoid permutations. However, external validation data sets are often not available, especially for rarely studied phenotypes (Mak, Porsch, Choi, & Sham, 2018), and in smaller samples, splitting the data into training and validation sets can decrease power. As an alternative to the optimization approach, one could a priori choose a single tuning parameter setting (e.g., fixing the p ‐value threshold and LD pruning level) to construct a single PRS.…”
Section: Introductionmentioning
confidence: 99%
“…Our study is not without limitations. We could potentially have overfitting in our model ( Mak et al, 2018 ) due to any overlap between the discovery data and the shrinkage applied to the GWAS effect size based on HCHS/SOL data, but it is unlikely given the different data sources. The study design involves complex sampling, and therefore, we utilized complex survey methodology in our analysis.…”
Section: Discussionmentioning
confidence: 99%