The non-independent association of two alleles in a population.
Most neuropsychiatric disorders are highly polygenic, implicating hundreds to thousands of causal genetic variants that span much of the genome. This widespread polygenicity complicates biological understanding because no single variant can explain disease etiology. A strategy to advance biological insight is to seek convergent functions among the large set of variants and map them to a smaller set of disease-relevant genes and pathways. Accordingly, functional genomic resources that provide data on intermediate molecular phenotypes, such as gene-expression and methylation status, can be leveraged to functionally annotate variants and map them to genes. Such molecular quantitative trait locus mappings can be integrated with genome-wide association studies to make sense of the polygenic signal that underlies complex disease. Other resources that provide data on the 3-dimensional structure of chromatin and functional importance of specific genomic regions can be integrated similarly. In addition, mapped genes can then be tested for convergence in biological function, tissue, cell type, or developmental stage. In this review, we provide an overview of functional genomic resources and methods that can be used to interpret results from genome-wide association studies, and we discuss current challenges for biological understanding and future requirements to overcome them.
Drug repurposing may provide a solution to the substantial challenges facing de novo drug development. Given that 66% of FDA-approved drugs in 2021 were supported by human genetic evidence, drug repurposing methods based on genome-wide association studies (GWAS), such as drug gene-set analysis, may prove an efficient way to identify new treatments. However, to our knowledge, drug gene-set analysis has not been tested in non-psychiatric phenotypes, and previous implementations may have contained statistical biases when testing groups of drugs. Here, 1201 drugs were tested for association with hypercholesterolemia, type 2 diabetes, coronary artery disease, asthma, schizophrenia, bipolar disorder, Alzheimer's disease, and Parkinson's disease. We show that drug gene-set analysis can identify clinically relevant drugs (e.g., simvastatin for hypercholesterolemia [p = 2.82E-06]; mitiglinide for type 2 diabetes [p = 2.66E-07]) and drug groups (e.g., C10A for coronary artery disease [p = 2.31E-05]; insulin secretagogues for type 2 diabetes [p = 1.09E-11]) for non-psychiatric phenotypes. Additionally, we demonstrate that when the overlap of genes between drug-gene sets is considered we find no groups containing approved drugs for the psychiatric phenotypes tested. However, several drug groups were identified for psychiatric phenotypes that may contain possible repurposing candidates, such as ATC codes J02A (p = 2.99E-09) and N07B (p = 0.0001) for schizophrenia. Our results demonstrate that clinically relevant drugs and groups of drugs can be identified using drug gene-set analysis for a number of phenotypes. These findings have implications for quickly identifying novel treatments based on the genetic mechanisms underlying diseases.
Gouveia and colleagues (2022) 1 conducted a genome-wide association study (GWAS) of a polygenic risk score (PRS)-derived phenotype (N = 37,784), in which they identified 246 independent loci and 473 lead SNPs. This is an enormous increase compared to the most recent and largest GWAS of AD 2 (N = 1,126,563), which identified 38 loci. Here we show that the applied approach by Gouveia and colleagues may lead to an inflated false positive rate.In this approach, beta-estimates from a recent GWAS of Alzheimer's disease (AD) 3 were used to construct PRSs in the European UK Biobank 4 sample, using pruning and thresholding 5 with a p-value threshold of 5%. Next, a new case-control phenotype was constructed based on the bottom and top 5% of the PRS distribution, removing 90% of their initial sample. Lastly, a GWAS was conducted on this new PRS-derived phenotype. The authors reasoned that by enriching the sample for individuals with known AD-associated variants, you may also enrich for unknown AD-associated variants. Our major concern is that the applied approach used the same single-nucleotide polymorphisms (SNPs) to construct, as well as to predict the phenotype. In other words, the phenotype was partly regressed on itself, which can inflate test statistics.We performed simulations roughly emulating the approach (see Methods). In short, we simulated individual phenotypes under a liability threshold model and genotypes that loosely reflect the genetic architecture of AD 2,3,6 (excluding the APOE locus) including 170,000 independent SNPs of which 1200 were causal and 168,800 were non-causal (null-SNPs). We then simulated a discovery sample such that the PRS explains approximately 5% of the phenotypic variance on the liability scale (N = 366,771). We ran a GWAS of AD in this discovery sample and used the estimated betas to construct a PRS in a target sample (N = 300,000). We then selected individuals in the top and bottom 5% of the PRS distribution (N = 30,000) and ran a second GWAS on this new PRS-derived case-control phenotype. The target cohort overlapped to varying degrees with the discovery cohort (i.e. 0%, 50%, and 100%), noting the AD GWAS summary statistics used by Gouveia and colleagues (2022) 1 also contained the UK Biobank.Our results show highly inflated false positive rates in the GWAS of the PRS-derived phenotype (see Fig. 1 and Supplementary Table ). Across all null-SNPs and when there is no overlap between discovery and target cohort, the false positive rate was 0.0024 (s.e.m. = 1 × 10 -5 ), which constitutes a 48,000-fold increase compared to a well-controlled false positive rate of 5 × 10 -8 (see Supplementary Fig. 1 for α = 0.05). This inflation is driven by null-SNPs that were used to construct the PRS-derived phenotype. The false positive rate of these null-SNPs was equal to 0.05 (s.e.m. = 0.0002, a 1 × 10 6 -fold increase) when there was no overlap, while null-SNPs which were not used to construct the PRS-derived phenotype did not show any inflation. We also looked at the number of false positive associ...
In a recent study, a polygenic risk score (PRS) for Alzheimer's disease was used to construct a new phenotype for a subsequent genome-wide association study (GWAS). Here we show that the applied method, in which the same genetic variants are used to construct the PRS-derived phenotype as well as to assess their effect in a GWAS of the same phenotype, leads to inflated false positive rates. We illustrate this bias by simulation. We first simulate an initial discovery cohort, and run a GWAS of a disorder like Alzheimer's disease. We then simulate a target cohort, in which we construct a PRS based on the initial GWAS results. Following the published study, we select the bottom and top 5% of individuals in the PRS distribution and define them as controls and cases. Lastly, we run a GWAS on the new PRS-derived phenotype using all genetic variants. We show that at a significance threshold of 5 x 10-8, false positive rates are inflated up to 0.004 (an 80,000-fold increase compared to 5 x 10-8). We also show that such inflation can be prevented by excluding all variants that were used to construct the PRS (as well as all variants in linkage disequilibrium), when a GWAS on a PRS-derived phenotype is conducted.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.