We performed a multistage genome-wide association study (GWAS) including 7,683 individuals with pancreatic cancer and 14,397 controls of European descent. Four new loci reached genome-wide significance: rs6971499 at 7q32.3 (LINC-PINT; per-allele odds ratio [OR] = 0.79; 95% confidence interval [CI] = 0.74–0.84; P = 3.0×10−12), rs7190458 at 16q23.1 (BCAR1/CTRB1/CTRB2; OR = 1.46; 95% CI = 1.30–1.65; P = 1.1×10−10), rs9581943 at 13q12.2 (PDX1; OR = 1.15; 95% CI = 1.10–1.20; P = 2.4×10−9), and rs16986825 at 22q12.1 (ZNRF3; OR = 1.18; 95% CI = 1.12–1.25; P = 1.2×10−8). An independent signal was identified in exon 2 of TERT at the established region 5p15.33 (rs2736098; OR = 0.80; 95% CI = 0.76–0.85; P = 9.8×10−14). We also identified a locus at 8q24.21 (rs1561927; P = 1.3×10−7) that approached genome-wide significance located 455 kb telomeric of PVT1. Our study has identified multiple new susceptibility alleles for pancreatic cancer worthy of follow-up studies.
DNA methylation is an important epigenetic mechanism that has been linked to complex disease and is of great interest to researchers as a potential link between genome, environment, and disease. As the scale of DNA methylation association studies approaches that of genome-wide association studies (GWAS), issues such as population stratification will need to be addressed. It is well-documented that failure to adjust for population stratification can lead to false positives in genetic association studies, but population stratification is often unaccounted for in DNA methylation studies. Here, we propose several approaches to correct for population stratification using principal components from different subsets of genome-wide methylation data. We first illustrate the potential for confounding due to population stratification by demonstrating widespread associations between DNA methylation and race in 388 individuals (365 African American and 23 Caucasian). We subsequently evaluate the performance of our principal-components approaches and other methods in adjusting for confounding due to population stratification. Our simulations show that 1) all of the methods considered are effective at removing inflation due to population stratification, and 2) maximum power can be obtained with SNP-based principal components, followed by methylation-based principal components, which out-perform both surrogate variable analysis and genomic control. Among our different approaches to computing methylation-based principal components, we find that principal components based on CpG sites chosen for their potential to proxy nearby SNPs can provide a powerful and computationally efficient approach to adjustment for population stratification in DNA methylation studies when genome-wide SNP data are unavailable.
The breast cancer risk variants identified in genome-wide association studies explain only a small fraction of the familial relative risk, and the genes responsible for these associations remain largely unknown. To identify novel risk loci and likely causal genes, we performed a transcriptome-wide association study evaluating associations of genetically predicted gene expression with breast cancer risk in 122,977 cases and 105,974 controls of European ancestry. We used data from the Genotype-Tissue Expression Project to establish genetic models to predict gene expression in breast tissue and evaluated model performance using data from The Cancer Genome Atlas. Of the 8,597 genes evaluated, significant associations were identified for 48 at a Bonferroni-corrected threshold of P < 5.82 × 10, including 14 genes at loci not yet reported for breast cancer. We silenced 13 genes and showed an effect for 11 on cell proliferation and/or colony-forming efficiency. Our study provides new insights into breast cancer genetics and biology.
CpGassoc is implemented in R and is freely available at http://genetics.emory.edu/conneely; we are in the process of submitting it to CRAN.
Mediation analysis helps researchers assess whether part or all of an exposure’s effect on an outcome is due to an intermediate variable. The indirect effect can help in designing interventions on the mediator as opposed to the exposure and better understanding the outcome’s mechanisms. Mediation analysis has seen increased use in genome-wide epidemiological studies to test for an exposure of interest being mediated through a genomic measure such as gene expression or DNA methylation. Testing for the indirect effect is challenged by the fact that the null hypothesis is composite. We examined the performance of commonly used mediation testing methods for the indirect effect in genome-wide mediation studies. When there is no association between the exposure and the mediator and no association between the mediator and the outcome, we show that these common tests are overly conservative. This is a case that will arise frequently in genome-wide mediation studies. Caution is hence needed when applying the commonly used mediation tests in genome-wide mediation studies. We evaluated the performance of these methods using simulation studies, and performed an epigenome-wide mediation association study in the Normative Aging Study, analyzing DNA methylation as a mediator of the effect of pack-years on FEV1.
Integrating genome-wide association (GWAS) and expression quantitative trait locus (eQTL) data into transcriptome-wide association studies (TWAS) based on predicted expression can boost power to detect novel disease loci or pinpoint the susceptibility gene at a known disease locus. However, it is often the case that multiple eQTL genes colocalize at disease loci, making the identification of the true susceptibility gene challenging, due to confounding through linkage disequilibrium (LD). To distinguish between true susceptibility genes (where the genetic effect on phenotype is mediated through expression) and colocalization due to LD, we examine an extension of the Mendelian Randomization Egger regression method that allows for LD while only requiring summary association data for both GWAS and eQTL. We derive the standard TWAS approach in the context of Mendelian Randomization and show in simulations that the standard TWAS does not control Type I error for causal gene identification when eQTLs have pleiotropic or LD-confounded effects on disease. In contrast, LD Aware MR-Egger regression can control Type I error in this case while attaining similar power as other methods in situations where these provide valid tests. However, when the direct effects of genetic variants on traits are correlated with the eQTL associations, all of the methods we examined including LD Aware MR-Egger regression can have inflated Type I error. We illustrate these methods by integrating gene expression within a recent large-scale breast cancer GWAS to provide guidance on susceptibility gene identification.
Colorectal cancer (CRC) is a biologically heterogeneous disease. To characterize its mutational profile, we conduct targeted sequencing of 205 genes for 2,105 CRC cases with survival data. Our data shows several findings in addition to enhancing the existing knowledge of CRC. We identify PRKCI , SPZ1 , MUTYH , MAP2K4 , FETUB , and TGFBR2 as additional genes significantly mutated in CRC. We find that among hypermutated tumors, an increased mutation burden is associated with improved CRC-specific survival (HR = 0.42, 95% CI: 0.21–0.82). Mutations in TP53 are associated with poorer CRC-specific survival, which is most pronounced in cases carrying TP53 mutations with predicted 0% transcriptional activity (HR = 1.53, 95% CI: 1.21–1.94). Furthermore, we observe differences in mutational frequency of several genes and pathways by tumor location, stage, and sex. Overall, this large study provides deep insights into somatic mutations in CRC, and their potential relationships with survival and tumor features.
Sleep disordered breathing (SDB)-related overnight hypoxemia is associated with cardiometabolic disease and other comorbidities. Understanding the genetic bases for variations in nocturnal hypoxemia may help understand mechanisms influencing oxygenation and SDB-related mortality. We conducted genome-wide association tests across 10 cohorts and 4 populations to identify genetic variants associated with three correlated measures of overnight oxyhemoglobin saturation: average and minimum oxyhemoglobin saturation during sleep and the percent of sleep with oxyhemoglobin saturation under 90%. The discovery sample consisted of 8,326 individuals. Variants with p < 1 × 10 −6 were analyzed in a replication group of 14,410 individuals. We identified 3 significantly associated regions, including 2 regions in multi-ethnic analyses (2q12, 10q22). SNPs in the 2q12 region associated with minimum SpO 2 (rs78136548 p = 2.70 × 10 −10 ). SNPs at 10q22 were associated with all three traits including average SpO 2 (rs72805692 p = 4.58 × 10 −8 ). SNPs in both regions were associated in over 20,000 individuals and are supported by prior associations or functional evidence. Four additional significant regions were detected in secondary sex-stratified and combined discovery and replication analyses, including a region overlapping Reelin, a known marker of respiratory complex neurons.These are the first genome-wide significant findings reported for oxyhemoglobin saturation during sleep, a phenotype of high clinical interest. Our replicated associations with HK1 and IL18R1 suggest that variants in inflammatory pathways, such as the biologically-plausible NLRP3 inflammasome, may contribute to nocturnal hypoxemia.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.