Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R(2) increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.
Genome-wide association studies (GWAS) have laid the foundation for investigations into the biology of complex traits, drug development and clinical guidelines. However, the majority of discovery efforts are based on data from populations of European ancestry 1-3. In light of the differential genetic architecture that is known to exist between populations, bias in representation can exacerbate existing disease and healthcare disparities. Critical variants may be missed if they have a low frequency or are completely absent in European populations, especially as the field shifts its attention towards rare variants, which are more likely to be population-specific 4-10. Additionally, effect sizes and their derived risk prediction scores derived in one population may Reprints and permissions information is available at http://www.nature.com/reprints.
Though discovered over 100 years ago, the molecular foundation of sporadic Alzheimer's disease (AD) remains elusive. To better characterize the complex nature of AD, we constructed multiscale causal networks on a large human AD multi-omics dataset, integrating clinical features of AD, DNA variation, and gene-and protein-expression. These probabilistic causal models enabled detection, prioritization and replication of high-confidence master regulators of AD-associated networks, including the top predicted regulator, VGF. Overexpression of neuropeptide precursor VGF in 5xFAD mice partially rescued beta-amyloid-mediated memory impairment and neuropathology. Molecular validation of network predictions downstream of VGF was also achieved in this AD model, with significant enrichment for homologous genes identified as differentially expressed in 5xFAD brains overexpressing VGF. Our findings support a causal role for VGF in protecting against AD pathogenesis and progression.
BackgroundPathogenic variants in BRCA1 and BRCA2 (BRCA1/2) lead to increased risk of breast, ovarian, and other cancers, but most variant-positive individuals in the general population are unaware of their risk, and little is known about prevalence in non-European populations. We investigated BRCA1/2 prevalence and impact in the electronic health record (EHR)-linked BioMe Biobank in New York City.MethodsExome sequence data from 30,223 adult BioMe participants were evaluated for pathogenic variants in BRCA1/2. Prevalence estimates were made in population groups defined by genetic ancestry and self-report. EHR data were used to evaluate clinical characteristics of variant-positive individuals.ResultsThere were 218 (0.7%) individuals harboring expected pathogenic variants, resulting in an overall prevalence of 1 in 139. The highest prevalence was in individuals with Ashkenazi Jewish (AJ; 1 in 49), Filipino and other Southeast Asian (1 in 81), and non-AJ European (1 in 103) ancestry. Among 218 variant-positive individuals, 112 (51.4%) harbored known founder variants: 80 had AJ founder variants (BRCA1 c.5266dupC and c.68_69delAG, and BRCA2 c.5946delT), 8 had a Puerto Rican founder variant (BRCA2 c.3922G>T), and 24 had one of 19 other founder variants. Non-European populations were more likely to harbor BRCA1/2 variants that were not classified in ClinVar or that had uncertain or conflicting evidence for pathogenicity (uncertain/conflicting). Within mixed ancestry populations, such as Hispanic/Latinos with genetic ancestry from Africa, Europe, and the Americas, there was a strong correlation between the proportion of African genetic ancestry and the likelihood of harboring an uncertain/conflicting variant. Approximately 28% of variant-positive individuals had a personal history, and 45% had a personal or family history of BRCA1/2-associated cancers. Approximately 27% of variant-positive individuals had prior clinical genetic testing for BRCA1/2. However, individuals with AJ founder variants were twice as likely to have had a clinical test (39%) than those with other pathogenic variants (20%).ConclusionsThese findings deepen our knowledge about BRCA1/2 variants and associated cancer risk in diverse populations, indicate a gap in knowledge about potential cancer-related variants in non-European populations, and suggest that genomic screening in diverse patient populations may be an effective tool to identify at-risk individuals.
Highlights d Genomic data linked to health records capture demography in health systems d Genetic networks reveal recent common ancestry in diverse populations d Evidence of many founder populations in New York City d Fine-scale population structure impacts genetic risk predictions
Heritability is essential for understanding the biological causes of disease but requires laborious patient recruitment and phenotype ascertainment. Electronic health records (EHRs) passively capture a wide range of clinically relevant data and provide a resource for studying the heritability of traits that are not typically accessible. EHRs contain next-of-kin information collected via patient emergency contact forms, but until now, these data have gone unused in research. We mined emergency contact data at three academic medical centers and identified 7.4 million familial relationships while maintaining patient privacy. Identified relationships were consistent with genetically derived relatedness. We used EHR data to compute heritability estimates for 500 disease phenotypes. Overall, estimates were consistent with the literature and between sites. Inconsistencies were indicative of limitations and opportunities unique to EHR research. These analyses provide a validation of the use of EHRs for genetics and disease research.
Data availabilitySummary statistics generated by COVID-19 Host Genetics Initiative are available online (https://www.covid19hg.org/results/r6/). The analyses described here use the freeze 6 data. The COVID-19 Host Genetics Initiative continues to regularly release new data freezes. Summary statistics for samples from individuals of non-European ancestry are not currently available owing to the small individual sample sizes of these groups, but the results for 23 loci lead variants are reported in Supplementary Table 3. Individual-level data can be requested directly from the authors of the contributing studies, listed in Supplementary Table 1.
Self-reported ancestry, genetically determined ancestry, and APOL1 polymorphisms are associated with variation in kidney function and related disease risk, but the relative importance of these factors remains unclear. We estimated the global proportion of African ancestry for 9048 individuals at Mount Sinai Medical Center in Manhattan (3189 African Americans, 1721 European Americans, and 4138 Hispanic/Latino Americans by self-report) using genome-wide genotype data. CKD-EPI eGFR and genotypes of three APOL1 coding variants were available. In admixed African Americans and Hispanic/Latino Americans, serum creatinine values increased as African ancestry increased (per 10% increase in African ancestry, creatinine values increased 1% in African Americans and 0.9% in Hispanic/Latino Americans; P#1x10 27 ).eGFR was likewise significantly associated with African genetic ancestry in both populations. In contrast, APOL1 risk haplotypes were significantly associated with CKD, eGFR,45 ml/min per 1.73 m 2 , and ESRD, with effects increasing with worsening disease states and the contribution of genetic African ancestry decreasing in parallel. Using genetic ancestry in the eGFR equation to reclassify patients as black on the basis of $50% African ancestry resulted in higher eGFR for 14.7% of Hispanic/Latino Americans and lower eGFR for 4.1% of African Americans, affecting CKD staging in 4.3% and 1% of participants, respectively. Reclassified individuals had electrolyte values consistent with their newly assigned CKD stage. In summary, proportion of African ancestry was significantly associated with normal-range creatinine and eGFR, whereas APOL1 risk haplotypes drove the associations with CKD. Recalculation of eGFR on the basis of genetic ancestry affected CKD staging and warrants additional investigation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.